Chapter 2 Week One Meeting
Tina is going to be helping everyone get their systems all set up. After that we will have everyone clone an RStudio project from GitHub to see how easy that is.
2.1 Software Installation
RStudio: We want the latest “development” version of RStudio becuase it has features that we may want to use during this course. Get it from https://www.rstudio.com/products/rstudio/download/preview/ and install the appropriate one for your OS.
R: Let’s make sure that we are all using the latest version of R. On March 7, 2017, version 3.3.3 was released. Go to https://cran.r-project.org/ and find the download link for your computer system. Download it and install it.
bookdown: This package is what I used to create these course notes. Getting it automatically installs a lot of other packages that are useful for authoring reproducible research. We want the latest development version, which can be obtained from GitHub by issuing the following commans at the R prompt (i.e. in the console window of RStudio:)
Install other packages that we are going to be needing in the first few weeks. If you don’t know how to install packages, ask Tina and she can show you. Install:
- Make sure that git is up and running on your system.
If you are using a Mac with a reasonably new OS, you should be able to just open the Terminal application (
/Applications/Utilities/Terminal) and type “git” at the command line. If you have
gitit will say something that starts like:
usage: git [--version] [--help] [-C <path>] [-c name=value] [--exec-path[=<path>]] [--html-path] [--man-path] [--info-path] [-p | --paginate | --no-pager] [--no-replace-objects] [--bare] [--git-dir=<path>] [--work-tree=<path>] [--namespace=<name>] <command> [<args>] These are common Git commands used in various situations: start a working area (see also: git help tutorial) clone Clone a repository into a new directory etc. etc. etc.
If you do not have
gitthen it should pop up a little thing asking if you would like to install a reduced set of developer tools. You do. Click OK. NOTE Instead of a pop up it might say something like, “xcrun Error: invalid active developer path. etc. etc…”. In that case, you can install a fresh set of command line tools by typing this at the command line:
- If you are using a PC, I can’t be as much help, but you can find links with instructions on how to download
gitfor a PC here.
If you are using Linux then we will assume you know how to get
gitor that you already have it.
2.2 Get an account on GitHub
If you don’t already have an account on GitHub, go to github.com and click the “sign up” link near upper right of the page. It is pretty self-explanatory. Go ahead and get a free account. There is nothing to pay for here!
2.2.1 Private repositories
If you are a graduate student and you do not feel comfortable posting your data on a public site like GitHub, then you should request some private repositories from GitHub. GitHub has a great deal for academic users like students: free private repositories. Please go to https://education.github.com/pack to sign up for your free student pack.
2.3 Open an RStudio Project from GitHub
I am going to have everyone use RStudio and GitHub to clone and open an RStudio project that I prepared as a template so that people can see how I would like them to start putting together their own projects.
To open this project, from RStudio, go to the menu option “File->New Project…”. Then from the resulting dialog, choose “Version Control”. Then choose “Git”. Then it asks for a “repository URL”. Supply this:
https://github.com/eriqande/rep-res-coho-example and leave the “Project Directory Name” empty. And then choose a directory in which to put it and click OK.
Bam! That will pull the RStudio project off of GitHub, make a local clone of it on your hard drive and open.
Once you have done that. Open
README.Rmd within the project, and click the “knit” button which should be present near the top left of the editor window.
That is how you convert an R Markdown README to README.md which is easy to read and see on GitHub.
If you want to see what the project repository looks like on GitHub, have a look at https://github.com/eriqande/rep-res-coho-example.
2.4 Assignment for next week: Create an RStudio Project with Your Own Data
Your mission for the following week—i.e., please have this done (or as done as you can get it) by Friday, April 14, 2017—is to prepare an RStudio project with your own data set, and provide some background about the data and the ways that you would like to analyze it. The “rep-res-coho-example” is an example of what I have in mind for this. You should use the README.Rmd from that project as a template for your own README.Rmd. (To do this you can just copy the README.Rmd file into the top level of your project directory and then edit it to reflect your own data and project.)
To do all this you are going to want to make your own project. Do that like this:
- In RStudio, choose “File->New Project…”
- Then choose “New Directory” and then choose “Empty Project”
- In the next dialog, choose a name (it is best to use only letters, numbers, dashes, and underscores, and include no spaces in the name) for it and be sure to click the “Create a git repository” button.
- Then click “Create Project”.
That should give you a new project. Here are some guidelines for putting your own data in there
- Put all of your data in a directory named
datain your project.
- CSV (comma separated values) is probably the best format to use. It is text-readable without proprietary software (unlike an Excel file); however if you need to look at it in a tabular way with Excel, (gasp), you can do that easily. Tab-delimited text works if you have that, but CSV is preferred.
- Use only letters, numbers, dashes, and underscores for the file names, (and periods for their extensions, i.e.,
- Give a brief description of your data in the README.Rmd.
2.5 Reading for next week
This week (before Friday, April 14, 2017), please read the following sections of the R for Data Science book
- Workflow basics: super basic review on how R works.
- Workflow: projects: info about organizing RStudio projects.
- Workflow: scripts: how to evaluate code in scripts.
- tibbles: a streamlined data frame format.
- data import This is our key reading for the week.
When you are done with the Data Import reading, take a whack at writing some code to read the data files in your project into a variable (or several variables).