Phase I: The Basics

In this phase, we’ll learn the basics of data analysis by working through 5 learning modules devoted to core topics in the field. This will most likely be the most challenging part of the course because many, if not most of you will not have ever designed and built instruments and data acquisition systems, let alone analyzed data from them. Rest assured, your experience will be curated, sticking to the bare bones of instrument design and data science. This introductory phase will set the groundwork for Phase II, the independent development of sensors and data analysis.

Module 1: Data Pirates Code with. . . . R, Introduction to the Basics

In this module we’ll tackle where to find resource you’ll need on a week-to-week, module-by-module basis. In addition, we’ll work on the first important topic of the course: an introduction to R, an ever more popular language for data analysis and visualization. We’ll focus on the basics of importing and tidying data in R. By importing, we mean loading data collected by scientists and stored in text files, the common currency of scientific analysis. By tidying, we mean preparing data for analysis.

Important Dates

Wednesday, 8/28: Establish GitHub account and add your handle to this form

Wednesday 9/3: WCR 1

Sunday, 9/8: Commit Module 1 Project report

All assignments are due at 11:59 on the specified date.

Key Concepts

Course workflow
Course expectations
R user interface
Working directory
Objects, functions, arguments, and scripts in R
Lists, data frames, and matrices
Basic R notation
The for loop
git, github, and version control

Learning Objectives

Upon completion of this week and module, students will be able to:

Know what is expected during Phase I of the course
Open R and RStudio
Set a working directory and load data in R
Recognize the difference between different objects (e.g., vectors, lists, data frames etc.)
Perform simple data operations and manipulation of data objects in R
Construct and run a for loop in R to automate a interatively redundant task.
clone a local directory and version controlteam-based work on common script and report files

Read/Watch

From Hands on Programming with R (HOPR)

When reading through HOPR, make sure to have R and R Studio up and running so that you can copy and paste code from HOPR into an R file and run the specified commands and complete the books project tasks. This will help tremendously in getting the hang of things.

From Happy Git with R

Chapter 12

Be sure you’re familiar with the prerequsites outlined in Chapter 12. We’ll cover git and github in class on August 28th. Just introduce yourself to the concepts.

Check out the new github access token requirements.

This is a lot of reading and fiddling with R. Don’t worry too much about understanding every detail! The goal here is to just get the hang of things and, just as importantly, know where to find answers to questions about how R works.

Do

Post one and respond to one question each week of Module 1 in our discussion board
Install R and RStudio
Clone a github team repository to a local directory (done in class with Prof. K’s help on Wednesday, Aug 28th)
Complete and commit a report for Module 1 Project

Guiding Questions

How do I perform simple operations in R?
What are R’s basic object classes?
How does external data get loaded into R?
How does one subset common data objects in R?
How does a for loop work and when should it be used?
What are the basic git opertations and their value in sharing code.

Module 2: Data Wrangling: Introduction to Tools of the Trade in Data Analysis

This module is intended to give students the background and skills to begin simple data tidying and transformation operations. We need our data to be tidy before we can analyze it, that is, structured in a way that makes analysis possible. In addition, often the data we have aren’t exactly what we want to analyze. In this module, we’ll focus on how to read, tidy, and merge data.

Important dates

Sunday, 9/15: commit Module 2 Project report

Key Concepts

Reading text files in R
Adding data based on data (i.e., data transformation)
Summarize data according to variables
Tidying data in R
Relational data and merging data sets

Learning Objectives

Upon completion of this week and module, students will be able to:

Read text data into R.
Pivot data from wide to longer formats and vice versa.
Add columns of data to a data set based on values in other columns.
Write and use a simple function in R.
Tidy a data set to begin analysis with a particular goal in mind.
Join data sets based on similar data (i.e., key) attributes

Read/Watch

All form R for Data Science (R4DS):

Chapter 5: Transformations
All from the Wrangle Section of R4DS:
- Chapter 9: Introduction
- Chapter 10: Tibbles
- Chapter 11: Data Import [don’t worry too much about 11.3 and 11.4 “Parsing Vectors” and “Parsing Files]
- Chapter 12: Tidy Data
- Chapter 13: Relational Data
Chapter 19.1: Functions

As with HOPR, make sure to have R and RStudio up and running so that you can copy and paste code from R4DS into an R file and run the specified commands and complete the book’s exercises.

Do

Post one and respond to one question in discussion board [10 points]
Complete and commit a report for Module 2 Project

Guiding Questions

How does one read data into R?
What is tidy data?
What’s the difference between long and wide data tables?
How does one add and summarize data in R?
How does one merge data in join operations?

Module 3: The Whiz and Viz Bang of Data: The Basics of Data Visualization and Modeling

In Module 3, we’ll cover the basics of visualizing and modeling data and hypothesis testing. We’ve used some simple visualization techniques in ggplot already and we’ll leverage this introduction to dig deeper into how data can be plotted to explore and ask informed questions about it. In addition, we’ll learn how and when to compared the fit of models.

Important Dates

Sunday9/22: Commit Module 3 Project report

Key Concepts

The grammar of ggplot
Linear models and least squares regression
Phylogenetic signal and correction
Brownian motion models of trait evolution
Model fit and the Aikiake Information Criterion (AIC)

Learning Objectives

Upon completion of this week and module, students will be able to:

Perform simple modeling operations in R.
Explore patterns to construct these models by visualizing data in R’s ggplot environment.
Perform phylogenetically corrected regression analysis of comparative data.
Compare model fit using an information theory approach.

Read/Watch

All form R for Data Science (R4DS):

Chapter 3:Data Visualiation
Section IV: Models
- Chapter 22: Introduction
- Chapter 23: Model Basics
- Chapter 24: Model Building

From Phylogenetic Comparative Methods by Luke Harmon [don’t worry too much about the mathematics, focus on the definitions and concepts]

Chapter 1.2:The Roots of Comparative Methods
Chapter 2.1-2.3: Fitting Statistical Models to Data
Chapter 3.1-3.4:Introduction to Brownian Motion
Chapter 6.4: Non-Brownian evolution under stabilizing selection

Do

Post one and respond to one question in discussion board [10 points]
Complete and commit a report for Module 3 Project

Guiding Questions

What are the basic components of plotting in ggplot?
How does one use basic visualization techniques to propose questions in biology?
How does one construct simple models in R?
How does one account for phylogeny when constructing models of comparative data?
What basic models of trait evolution can one use in comparative research?
How does one know if they’re asking the right question (i.e., using the right model) in comparative research?

Module 4: Making Messes Pretty: Leveraging Markdown to Produce Pleasing Reports

In Module 4, we’ll cover the basics of authoring reports in R Markdown, a markup language used to communicate the results of analysis in R. With R Markdown, you’ll develop a framework for compiling ideas, code, and graphics in one reproducible document.

Important dates

Sunday, 9/29: Commit Module 4 Project Report

Key Concepts

The R Markdown framework
Text markup
Code chunks
Graphics and tables in R Markdown
Knitting documents in R Markdown

Learning Objectives

Upon completion of this week and module, students will be able to:

Author a simple R Markdown.
Run code within an R Markdown document.
Add graphics (i.e., figures), images, tables and references to and R Markdown document.
Render (i.e., knit) this R Markdown into HTML and PDF documents.

Read/Watch

Chapter 27 in R4DS:R Markdown
R Studio’s markdown tutorial

This module is very much learn as you go. Please dive right into the project and come back to the readings when clarifications are needed.

Do

Post one and respond to one question in discussion board [10 points]
Complete and commit a report for Module 3 Project

Guiding Questions

What are the basic components of an .Rmd?
How does one included headers, marked up text, graphics, tables, and references in an .Rmd?
How does one render and .Rmd into an HTML document?

Phase II: Developing a Toolbox

In Phase II, we’ll use our new-found instrument- and code-development skills to address key questions in organismal biology. These will be self-guided explorations of instrument design and use and data analysis meant to challenge you to put these skills to good use in answering how vertebrates muscles work, phylogeny matters in physiological systems, and how a changing climate impacts the distribution of vertebrates. Within this phase we’ll break up into our teams to focus on 4 modules, some data driven, the others both data and instrument driven.

Module 5: Shark!

This module puts our newly formed data analysis and markdown skills to use in answering how body shape evolved in the Selachii (i.e., sharks). Specifically, we’ll test whether body shape has evolved differently between sharks inhabiting different habitats.

Important Dates:

Thursday, 10/10: Upload data Sunday, 10/20: Commit Module 5 Project Report

Key Concepts

Geometic Morphometrics
Multivariate analysis
Evolutionary rate
Phylognetically informed regression
Parallel computing

Learning Objectives

Upon completion of this module, students will be able to:

Understand the basics of geometric amorphometrics.
Assemble a large dataset of comparative shape information.
Comment on how accounting for phylogenetic relationships affects analysis in shape studies.

Read/Watch

Do

Download and install FIJI (Fiji Is Just imageJ)
Post one and respond to one question in discussion board
Upload team data files to this directory by Tuesday, October 8th
Complete and commit Module 6 Project report

Guiding Questions

How does one acquire shape information for comparative analysis?
Did shape evolve differently between the pelagic and benthic sharks?
How does one formulate a complete .Rmd report of Phase II of the course?

Module 6: Spatial Awareness

Important Dates

Sunday, 11/10: Commit Module 6 Project Report

Key Concepts

Spatial analysis
Raster vs. shape data
Spatial autocorrelation
Random Forest regression

Learning Objectives

Understand the basics of map creation
Plot raster and shape data
Combine raster and shape data to assess geospatial patterns
Assess whether spatial data is random or autocorrelated
Use proportion data in a random forest regression analysis

Read/Watch

Chapters 1-3, 7 and 8 in Spatial Data Science
Chapters 5 and 8 in Spatial Statistics for Data Science: Theory and Practice with R
Random Forest tutorial

Do

11/10: Commit Module 6 Project Report

Guiding Questions

What are the differences between raster and shape dats?
How does one plot simple, but pretty maps?
How does a geospatial scientist assess autocorrelation of spatial data?
How does one assemble and assess a random forest regression model?

Module 7: Birds of a Feather Migrate Together . . . Sometimes: Using EBird data to Explore the Phenology of Passerine Migration (data driven)

In this module, we’ll explore the phenology of spring arrival timing for neotropical passerines as it relates to historical weather conditions. To do so, we’ll download species occurrence data from the Global Biodiversity Information Facility (GBIF), an international network and data infrastructure whose goal is to provide users open access to data about all types of life on Earth. We’ll access GBIF data using rgif, an R package that permits access to GBIF’s application programming interface (API). An API permits a user to interact with multiple software pieces (databases, code sources for processing and analysis, etc.). To assemble historical weather data, we’ll also use another R package, rnoa. This package contains functions that interact with NOAA’s National Climatic Data Center’s API.

Important Dates

Sunday, 11/24: Commit Module 7 Project report

Key Concepts

API
Phenology
Migrations
Linear Mixed-effects Modeling

Learning Objectives

Upon completion of this week and module, students will be able to:

Download data using R’s interface to one of several APIs.
Apply linear mixed-effect models to biological data.
Comment on how climatic and meteorological conditions may effect bird poulaiton.

Read/Watch

Do

Post one and respond to one question in discussion board
Complete and commit Module 9 Project report

Guiding Questions

How does one download data using interfaces developed for R?
Is something other the temperature important in explaining the arrival time of neotropical migrants in MA?

Phase III: Final Projects

Imporant Dates

Wednesday,12/16: Commit Final Project report

In Phase III, you’ll be tasked with quickly developing a project of your own design that either (1) requires the fabrication of a new or modified data acquisition system or (2) the acquisition of data from a peer-reviewed study or database. The goal of this project is to answer a question personal and interesting to your team. The question should be new within the context of the course and reflect a physiological focus (i.e., not ). This project will require you to first delve into the the scientific literature to set the stage and then leverage your new data-analysis and report-writing skills in the synthesis of 4-5 page markdown. This final project report should follow the same format and requirements of the reports produced in Phase II, but with expanded content.

More details about the final project and its expectations can be found here.

Course Details

Phase I: The Basics

Module 1: Data Pirates Code with. . . . R, Introduction to the Basics

Important Dates

Key Concepts

Learning Objectives

Read/Watch

Do

Guiding Questions

Module 2: Data Wrangling: Introduction to Tools of the Trade in Data Analysis

Important dates

Key Concepts

Learning Objectives

Read/Watch

Do

Guiding Questions

Module 3: The Whiz and Viz Bang of Data: The Basics of Data Visualization and Modeling

Important Dates

Key Concepts

Learning Objectives

Read/Watch

Do

Guiding Questions

Module 4: Making Messes Pretty: Leveraging Markdown to Produce Pleasing Reports

Important dates

Key Concepts

Learning Objectives

Read/Watch

Do

Guiding Questions

Phase II: Developing a Toolbox

Module 5: Shark!

Important Dates:

Key Concepts

Learning Objectives

Read/Watch

Do

Guiding Questions

Module 6: Spatial Awareness

Important Dates

Key Concepts

Learning Objectives

Read/Watch

Do

Guiding Questions

Module 7: Birds of a Feather Migrate Together . . . Sometimes: Using EBird data to Explore the Phenology of Passerine Migration (data driven)

Important Dates

Key Concepts

Learning Objectives

Read/Watch

Do

Guiding Questions

Phase III: Final Projects

Imporant Dates