Phase I: The Basics

In this phase, we’ll learn the basics of data analysis by working through 5 learning modules devoted to core topics in the field. This will most likely be the most challenging part of the course because many, if not most of you will not have ever designed and built instruments and data acquisition systems, let alone analyzed data from them. Rest assured, your experience will be curated, sticking to the bare bones of instrument design and data science. This introductory phase will set the groundwork for Phase II, the independent development of sensors and data analysis.

Module 1: Data Pirates Code with. . . . R, Introduction to the Basics

In this module we’ll tackle where to find resource you’ll need on a week-to-week, module-by-module basis. In addition, we’ll work on the first important topic of the course: an introduction to R, an ever more popular language for data analysis and visualization. We’ll focus on the basics of importing and tidying data in R. By importing, we mean loading data collected by scientists and stored in text files, the common currency of scientific analysis. By tidying, we mean preparing data for analysis.

Important Dates

Wednesday, 8/28: Establish GitHub account and add your handle to this form

Wednesday 9/3: WCR 1

Sunday, 9/8: Commit Module 1 Project report

All assignments are due at 11:59 on the specified date.

Key Concepts

  • Course workflow
  • Course expectations
  • R user interface
  • Working directory
  • Objects, functions, arguments, and scripts in R
  • Lists, data frames, and matrices
  • Basic R notation
  • The for loop
  • git, github, and version control

Learning Objectives

Upon completion of this week and module, students will be able to:

  • Know what is expected during Phase I of the course
  • Open R and RStudio
  • Set a working directory and load data in R
  • Recognize the difference between different objects (e.g., vectors, lists, data frames etc.)
  • Perform simple data operations and manipulation of data objects in R
  • Construct and run a for loop in R to automate a interatively redundant task.
  • clone a local directory and version controlteam-based work on common script and report files

Read/Watch

From Hands on Programming with R (HOPR)

When reading through HOPR, make sure to have R and R Studio up and running so that you can copy and paste code from HOPR into an R file and run the specified commands and complete the books project tasks. This will help tremendously in getting the hang of things.

From Happy Git with R

Be sure you’re familiar with the prerequsites outlined in Chapter 12. We’ll cover git and github in class on August 28th. Just introduce yourself to the concepts.

Check out the new github access token requirements.

This is a lot of reading and fiddling with R. Don’t worry too much about understanding every detail! The goal here is to just get the hang of things and, just as importantly, know where to find answers to questions about how R works.


Do

  • Post one and respond to one question each week of Module 1 in our discussion board
  • Install R and RStudio
  • Clone a github team repository to a local directory (done in class with Prof. K’s help on Wednesday, Aug 28th)
  • Complete and commit a report for Module 1 Project

Guiding Questions

  • How do I perform simple operations in R?
  • What are R’s basic object classes?
  • How does external data get loaded into R?
  • How does one subset common data objects in R?
  • How does a for loop work and when should it be used?
  • What are the basic git opertations and their value in sharing code.

Module 2: Data Wrangling: Introduction to Tools of the Trade in Data Analysis

This module is intended to give students the background and skills to begin simple data tidying and transformation operations. We need our data to be tidy before we can analyze it, that is, structured in a way that makes analysis possible. In addition, often the data we have aren’t exactly what we want to analyze. In this module, we’ll focus on how to read, tidy, and merge data.

Important dates

Sunday, 9/15: commit Module 2 Project report

Key Concepts

  • Reading text files in R
  • Adding data based on data (i.e., data transformation)
  • Summarize data according to variables
  • Tidying data in R
  • Relational data and merging data sets

Learning Objectives

Upon completion of this week and module, students will be able to:

  • Read text data into R.
  • Pivot data from wide to longer formats and vice versa.
  • Add columns of data to a data set based on values in other columns.
  • Write and use a simple function in R.
  • Tidy a data set to begin analysis with a particular goal in mind.
  • Join data sets based on similar data (i.e., key) attributes

Read/Watch

All form R for Data Science (R4DS):


As with HOPR, make sure to have R and RStudio up and running so that you can copy and paste code from R4DS into an R file and run the specified commands and complete the book’s exercises.


Do

  • Post one and respond to one question in discussion board [10 points]
  • Complete and commit a report for Module 2 Project

Guiding Questions

  • How does one read data into R?
  • What is tidy data?
  • What’s the difference between long and wide data tables?
  • How does one add and summarize data in R?
  • How does one merge data in join operations?


Module 3: The Whiz and Viz Bang of Data: The Basics of Data Visualization and Modeling

In Module 3, we’ll cover the basics of visualizing and modeling data and hypothesis testing. We’ve used some simple visualization techniques in ggplot already and we’ll leverage this introduction to dig deeper into how data can be plotted to explore and ask informed questions about it. In addition, we’ll learn how and when to compared the fit of models.

Important Dates

Sunday9/22: Commit Module 3 Project report

Key Concepts

  • The grammar of ggplot
  • Linear models and least squares regression
  • Phylogenetic signal and correction
  • Brownian motion models of trait evolution
  • Model fit and the Aikiake Information Criterion (AIC)

Learning Objectives

Upon completion of this week and module, students will be able to:

  • Perform simple modeling operations in R.
  • Explore patterns to construct these models by visualizing data in R’s ggplot environment.
  • Perform phylogenetically corrected regression analysis of comparative data.
  • Compare model fit using an information theory approach.

Read/Watch

All form R for Data Science (R4DS):

From Phylogenetic Comparative Methods by Luke Harmon [don’t worry too much about the mathematics, focus on the definitions and concepts]

Do

  • Post one and respond to one question in discussion board [10 points]
  • Complete and commit a report for Module 3 Project

Guiding Questions

  • What are the basic components of plotting in ggplot?
  • How does one use basic visualization techniques to propose questions in biology?
  • How does one construct simple models in R?
  • How does one account for phylogeny when constructing models of comparative data?
  • What basic models of trait evolution can one use in comparative research?
  • How does one know if they’re asking the right question (i.e., using the right model) in comparative research?



Module 4: Making Messes Pretty: Leveraging Markdown to Produce Pleasing Reports

In Module 4, we’ll cover the basics of authoring reports in R Markdown, a markup language used to communicate the results of analysis in R. With R Markdown, you’ll develop a framework for compiling ideas, code, and graphics in one reproducible document.

Important dates

Sunday, 9/29: Commit Module 4 Project Report

Key Concepts

  • The R Markdown framework
  • Text markup
  • Code chunks
  • Graphics and tables in R Markdown
  • Knitting documents in R Markdown

Learning Objectives

Upon completion of this week and module, students will be able to:

  • Author a simple R Markdown.
  • Run code within an R Markdown document.
  • Add graphics (i.e., figures), images, tables and references to and R Markdown document.
  • Render (i.e., knit) this R Markdown into HTML and PDF documents.

Read/Watch

This module is very much learn as you go. Please dive right into the project and come back to the readings when clarifications are needed.


Do

  • Post one and respond to one question in discussion board [10 points]
  • Complete and commit a report for Module 3 Project

Guiding Questions

  • What are the basic components of an .Rmd?
  • How does one included headers, marked up text, graphics, tables, and references in an .Rmd?
  • How does one render and .Rmd into an HTML document?



Phase II: Developing a Toolbox

In Phase II, we’ll use our new-found instrument- and code-development skills to address key questions in organismal biology. These will be self-guided explorations of instrument design and use and data analysis meant to challenge you to put these skills to good use in answering how vertebrates muscles work, phylogeny matters in physiological systems, and how a changing climate impacts the distribution of vertebrates. Within this phase we’ll break up into our teams to focus on 4 modules, some data driven, the others both data and instrument driven.

Module 5: Shark!

This module puts our newly formed data analysis and markdown skills to use in answering how body shape evolved in the Selachii (i.e., sharks). Specifically, we’ll test whether body shape has evolved differently between sharks inhabiting different habitats.

Important Dates:

Thursday, 10/10: Upload data Sunday, 10/20: Commit Module 5 Project Report

Key Concepts

  • Geometic Morphometrics
  • Multivariate analysis
  • Evolutionary rate
  • Phylognetically informed regression
  • Parallel computing

Learning Objectives

Upon completion of this module, students will be able to:

  • Understand the basics of geometric amorphometrics.
  • Assemble a large dataset of comparative shape information.
  • Comment on how accounting for phylogenetic relationships affects analysis in shape studies.

Do

Guiding Questions

  • How does one acquire shape information for comparative analysis?
  • Did shape evolve differently between the pelagic and benthic sharks?
  • How does one formulate a complete .Rmd report of Phase II of the course?

Module 6: Spatial Awareness

Important Dates

Sunday, 11/10: Commit Module 6 Project Report

Key Concepts

  • Spatial analysis
  • Raster vs. shape data
  • Spatial autocorrelation
  • Random Forest regression

Learning Objectives

  • Understand the basics of map creation
  • Plot raster and shape data
  • Combine raster and shape data to assess geospatial patterns
  • Assess whether spatial data is random or autocorrelated
  • Use proportion data in a random forest regression analysis

Do

11/10: Commit Module 6 Project Report

Guiding Questions

  • What are the differences between raster and shape dats?
  • How does one plot simple, but pretty maps?
  • How does a geospatial scientist assess autocorrelation of spatial data?
  • How does one assemble and assess a random forest regression model?

Module 7: Birds of a Feather Migrate Together . . . Sometimes: Using EBird data to Explore the Phenology of Passerine Migration (data driven)

In this module, we’ll explore the phenology of spring arrival timing for neotropical passerines as it relates to historical weather conditions. To do so, we’ll download species occurrence data from the Global Biodiversity Information Facility (GBIF), an international network and data infrastructure whose goal is to provide users open access to data about all types of life on Earth. We’ll access GBIF data using rgif, an R package that permits access to GBIF’s application programming interface (API). An API permits a user to interact with multiple software pieces (databases, code sources for processing and analysis, etc.). To assemble historical weather data, we’ll also use another R package, rnoa. This package contains functions that interact with NOAA’s National Climatic Data Center’s API.

Important Dates

Sunday, 11/24: Commit Module 7 Project report

Key Concepts

  • API
  • Phenology
  • Migrations
  • Linear Mixed-effects Modeling

Learning Objectives

Upon completion of this week and module, students will be able to:

  • Download data using R’s interface to one of several APIs.
  • Apply linear mixed-effect models to biological data.
  • Comment on how climatic and meteorological conditions may effect bird poulaiton.

Do

  • Post one and respond to one question in discussion board
  • Complete and commit Module 9 Project report

Guiding Questions

  • How does one download data using interfaces developed for R?
  • Is something other the temperature important in explaining the arrival time of neotropical migrants in MA?

Phase III: Final Projects

Imporant Dates

Wednesday,12/16: Commit Final Project report

In Phase III, you’ll be tasked with quickly developing a project of your own design that either (1) requires the fabrication of a new or modified data acquisition system or (2) the acquisition of data from a peer-reviewed study or database. The goal of this project is to answer a question personal and interesting to your team. The question should be new within the context of the course and reflect a physiological focus (i.e., not ). This project will require you to first delve into the the scientific literature to set the stage and then leverage your new data-analysis and report-writing skills in the synthesis of 4-5 page markdown. This final project report should follow the same format and requirements of the reports produced in Phase II, but with expanded content.

More details about the final project and its expectations can be found here.