Reproducible Research Course by Eric C. Anderson for (NOAA/SWFSC)


Course Introduction

Welcome and Introductions

Who is Eric and Why is he teaching this course?

  • Statistician that specializes in genetic data
  • Has used R (or S) since 1998, but didn’t really understand it until recently
    • Was more of a C programmer
  • Disliked R for most of the last 16 years
  • Taught an R course two years ago
    • When I finally learned R, I learned to love it
  • Seeing R incorporated into Reproducible Research excites me

What the heck is reproducible research?

  • In computational sciences and data analysis, let’s go with this definition:
    • the data and code used to make a finding are available and they are presented in such a way that it is (relatively) straightforward for an independent researcher to recreate the finding.
  • This actually seldom happens. Consider two interesting articles by Tim Vines:
  • Scientific articles have fairly detailed methods sections, but those are typically insufficient to actually reproduce an analysis.
  • Scientists owe it to themselves and their community to have an explicit record of all the steps in an analysis done at a computer.

How should we do reproducible research?

  • How do you efficiently record what it is you have done at your computer?
  • There would be lots of ways, but we will look at using 4 tools:
    1. R – open source, free, industry-standard analysis software
    2. RStudio – open source, free, environment for working with R
    3. git – open source, free version control software (because research is never really linear when it is happening)
    4. GitHub – website with tools making it very easy to collaborate with people using git.

Course Organization

  • Weekly format
    • Tuesdays: R and RStudio
    • Thursdays: How to write reports/articles using Rmarkdown. Using git and GitHub
  • However the course is integrated!
    • Would be hard to just do Tu or just Th.
    • We will use git to obtain and submit R homeworks, etc.
    • This is how it works in real life too…

Course Website

Note that you can see the raw rmarkdown source used to write each page.

Course Philosophy

Focus on the practical

  • Two years ago my course was focused on R as a programming language
  • Having satisfied my curiosity about that, this year I mostly want to teach people how to use R, practically, as a useful tool

We have a big group!

  • Quick show of hands for NMFS Fed, NMFS Contractor, grad student, PCS, MBARI, others?
  • How about the range of experience with R and git?
  • Great that we have that much interest
  • But, it will dictate a few things about the course:
    1. I won’t be able to answer everyone’s every question
      • “It takes a village”
    2. My vision: we are a dedicated community of researchers that will work together, helping one another become fluent in these tools. So:
      • If a practical in-class exercise, or a homework assignment seems easy to you because you are already well-versed in R, then take a moment to help someone that is being challenged by it!
      • If you are new to all this, find someone in the class that can help you!
    3. Please add comments to the DISQUS comment boxes on each page
      • Sign up with a reasonable identifier
      • Let’s crowd source solutions to issues that come up for people in the course!
      • Particularly for PC users (which I am not…)

Your contributions are invited!

  • Every page of this course is editable
  • If you see something that needs changing, you can do it and send me a pull request
  • This will make much more sense after a few more weeks.

Homework/Assignments

  • There will be homework! (As long as I have time to write more…)
  • You can’t learn these methods without doing and using them
  • Amounts of HW should not be overwhelming.
  • I won’t be grading everything, but:
    • I will establish “peer-review” system of homework
    • Everyone will do homework and “referee” homework
    • Doing this will let us get intimately familiar with how to use git and GitHub.
  • Familiar and advanced R users:
    • Please contribute assignments and homework problems
    • Once we have gotten the rudiments of GitHub down I will make a place for you to submit new homework problems

Any Questions?


comments powered by Disqus