Course Introduction
Welcome and Introductions
Who is Eric and Why is he teaching this course?
- Statistician that specializes in genetic data
- Has used R (or S) since 1998, but didn’t really understand it until recently
- Was more of a C programmer
- Disliked R for most of the last 16 years
- Taught an R course two years ago
- When I finally learned R, I learned to love it
- Seeing R incorporated into Reproducible Research excites me
What the heck is reproducible research?
- In computational sciences and data analysis, let’s go with this definition:
- the data and code used to make a finding are available and they are presented in such a way that it is (relatively) straightforward for an independent researcher to recreate the finding.
- This actually seldom happens. Consider two interesting articles by Tim Vines:
- The Availability of Research Data Declines Rapidly with Article Age “of 516 articles published between 2 and 22 years ago…the odds of a data set being extant fell by 17% per year.”
- Recommendations for utilizing and reporting population genetic analyses: the reproducibility of genetic clustering using the program structure “we reanalysed data sets gathered from papers using the software package ‘structure’… 30% of analyses were unable to reproduce the same number of population clusters.”
- Scientific articles have fairly detailed methods sections, but those are typically insufficient to actually reproduce an analysis.
- Scientists owe it to themselves and their community to have an explicit record of all the steps in an analysis done at a computer.
How should we do reproducible research?
- How do you efficiently record what it is you have done at your computer?
- There would be lots of ways, but we will look at using 4 tools:
- R – open source, free, industry-standard analysis software
- RStudio – open source, free, environment for working with R
- git – open source, free version control software (because research is never really linear when it is happening)
- GitHub – website with tools making it very easy to collaborate with people using git.
Course Organization
- Weekly format
- Tuesdays: R and RStudio
- Thursdays: How to write reports/articles using Rmarkdown. Using git and GitHub
- However the course is integrated!
- Would be hard to just do Tu or just Th.
- We will use git to obtain and submit R homeworks, etc.
- This is how it works in real life too…
Course Website
- Main Page: http://eriqande.github.io/rep-res-web/
- Syllabus: http://eriqande.github.io/rep-res-web/syllabus.html
Note that you can see the raw rmarkdown source used to write each page.
Course Philosophy
Focus on the practical
- Two years ago my course was focused on R as a programming language
- Having satisfied my curiosity about that, this year I mostly want to teach people how to use R, practically, as a useful tool
We have a big group!
- Quick show of hands for NMFS Fed, NMFS Contractor, grad student, PCS, MBARI, others?
- How about the range of experience with R and git?
- Great that we have that much interest
- But, it will dictate a few things about the course:
- I won’t be able to answer everyone’s every question
- “It takes a village”
- My vision: we are a dedicated community of researchers that will work together, helping one another become fluent in these tools. So:
- If a practical in-class exercise, or a homework assignment seems easy to you because you are already well-versed in R, then take a moment to help someone that is being challenged by it!
- If you are new to all this, find someone in the class that can help you!
- Please add comments to the DISQUS comment boxes on each page
- Sign up with a reasonable identifier
- Let’s crowd source solutions to issues that come up for people in the course!
- Particularly for PC users (which I am not…)
- I won’t be able to answer everyone’s every question
Your contributions are invited!
- Every page of this course is editable
- If you see something that needs changing, you can do it and send me a pull request
- This will make much more sense after a few more weeks.
Homework/Assignments
- There will be homework! (As long as I have time to write more…)
- You can’t learn these methods without doing and using them
- Amounts of HW should not be overwhelming.
- I won’t be grading everything, but:
- I will establish “peer-review” system of homework
- Everyone will do homework and “referee” homework
- Doing this will let us get intimately familiar with how to use git and GitHub.
- Familiar and advanced R users:
- Please contribute assignments and homework problems
- Once we have gotten the rudiments of GitHub down I will make a place for you to submit new homework problems
Any Questions?
comments powered by Disqus