Reproducible Research Course by Eric C. Anderson for (NOAA/SWFSC)


R and Rstudio Orientation

Let’s learn about:

  1. The idea of RStudio “Projects”
  2. The layout of the RStudio IDE
  3. How the R console works

Rstudio…

  • Is an IDE (integrated development environment) that sits on top of R and makes it easier to interact with R.
  • Organizes your work into R in neatly-contained packages of work (typically data and code) called “projects”
  • Nothing mysterious about these—just collections of files stored together in a single directory on your computer.

Let’s make a new project

Do This in RStudio:

  1. Select File -> New Project
  2. Create project from: Choose “New Directory”
    1. Choose “Empty Project”
    2. Give the directory name as project1
    3. and put it somewhere in your home directory (browse to where you want to put it in the “create project as subdirectory of:”)
    4. For now, unselect buttons to create a git repository or to use packrat.
    5. Hit Create Project

Now Explore:

  • Notice that RStudio has started R in your console in the project directory:
getwd()  # this tells you the current directory that R is in

# here is what it is on my computer:
[1] "/Users/eriq/project1"
  • Paste the following code into the console window and see what happens. Do not worry if you don’t know what any of it means.
# plot some linearly-related simulated data
set.seed(5) # set the random number seed
n <- 400
x_data <- rnorm(n, mean = 100, sd = 15)
y_data <- x_data + rnorm(n, mean=0, sd = 5)
plot(x_data, y_data)
fit <- lm(y_data ~ x_data)
abline(fit)

# don't know what the abline function does? Check out its help:
?abline
  • Now we will take a tour of some of the windows
    • Environment
    • History
    • Plots
    • Files
    • Help
    • etc

The R Console

“Communicating” with R

  • R is an interpreted language, so it just reads through source code line by line and executes the instructions it finds.
  • Contrast to compiled language like C in which all the source code is parsed and checked for consistency, and then binary computer instructions are made from that and optimized for speed, memory efficiency, etc.
  • Because R goes line by line, we can work interactively with it
  • You can also write programs in a text editor and then send them to R, all at once, to be interpreted.
    • This is much more reproducible than interactive work
    • But interactive work is good for testing things, etc.

The R Console

This is R’s “command prompt” where you type expressions and, when you hit RETURN, these expressions are evaluated (so long as they are complete expressions). Here is what it looks like:

>

That little > symbol at the beginning of the line says “R is ready to begin a new expression.” Now try entering an expression:

13+19
#> [1] 32

It evaluated “13+19” and, by default, if “nothing is done” with the result of the evaluation, R just prints the result. We’ll talk about the significance of the [1] later.

Inserting Comments in Your Code

  • Though it doesn’t seem all that useful when working in interactive mode, you can write comments in your code.
  • These let you point things out in your code, but not confuse the interpreter.
  • Everything right of a # until the end of the line is ignored by the interpreter:
13+19 # Look at me! I'm adding two primes
#> [1] 32

Command Recall

Using your up arrow will recall the expressions that have been evaluated recently by the R interpreter. This can be useful when working interactively.

A Semicolon is as good as a RETURN

You can break a line into multiple expressions that get evaluated individually by separating them with semicolons:

> 13+19; 12+2
[1] 32
[1] 14

When R gets to each semicolon, it evaluates the expression and then, because we aren’t doing anything else with it, it prints the result.

Incomplete Expressions Make R wait for you

If you press RETURN and R finds that the expression you typed is incomplete then it taps its foot and waits for you to input the rest of the expression, to make it complete. To show you that it is tapping its foot, R changes the prompt from > to +

13+   # This expression is incomplete!  
+  19   # when I enter this, R has a complete expression and can evaluate it!

If you force R to evaluate an incomplete expression by giving it a ; at the end of it, R will complain by giving an error message

13+  # incomplete expression
+ ;    # I don't care R! Evaluate it anyway you lousy bum!
Error: unexpected ';' in:
"13+
;"

Note that this is useful if you don’t know why your expression is incomplete and you want to start all over again anyway.

R is more than a glorified calculator

Please tell me that R does more than just adding numbers together!! Please, please!

You’re in luck. Though your R expressions can be as simple as adding two numbers together, the power of R starts to be realized when you assign values to objects, because you then can evaluate expressions involving those objects.

Objects vs Symbols

Warning This Little Section is Sort of Superfluous and Likely Confusing

You needn’t read this bit…

Hard core R people who trip out on the many cool meta-language features of R are likely to want to draw a distinction between objects (think of them as places in computer memory where values are stored) and symbols, which are the names the programmer gives to objects to access them (for example, the name fit several sections back). This is useful, but for a first course in R, it will probably suffice to just think in terms of objects and use the word “object” to refer to the computer instantiation of something that can hold a value and for the name that the programmer gives to it (i.e. fit).

Objects

  • An object is the name given to “anything that the R interpreter might know about.” One might say that “everything in R” is an object.
  • In vernacular, some might refer to an object as a variable
  • Examples:
x <- 1:10  # x is the name of an object that holds a vector 1, 2, 3, ..., 10
fit <- lm(y_data ~ x_data) # fit is an object holding the output of a linear model

# for that matter, y_data and x_data are objects.
  • If you want to give a name to an object and refer to it in your code, you must adhere to a few simple standards. You must:
    • use only letters or numerals: (a-zA-Z0-9) as well as . and _
    • start the object’s name with a letter or with .

QUIZ: What (if anything) is wrong with each of these potential object names: my.variable, your%variable, foo_bar, _bar_foo, .boing, Fish_lengths, 9_treatment_results

Assigning Values

The special operators:
<- and ->
are used to assign values to the objects.

  • These are like pointy arrows or sticks, so you can think of them as “stick this value in the object”
x <- 13    # leftwards assignment is typically used
19 -> y    # rightwards assignment is seldom used
z <- x + y   # assign result of the evaluated expression

# hey! it didn't print any result!
z  # just type the object to print it
#> [1] 32
  • If you assign a result, there isn’t anything left to print. But if you just type an object’s name, the value of the object is printed. This is called “Auto Printing”.
  • Note that -> is very seldom used in programs. It was more useful back in the days when people did a lot of interactive work with R.

R Assigns Values, not References

  • In some languages, like C, there are facilities to assign a variable to “point at” a specific area in the computer memory where the value of an object is stored. Then, if the value in that area of memory changes, you get that new value when you extract it using the variable that points at that area of memory.
  • That is emphatically not what happens in R. Observe:
x <- 19
y <- x
y
#> [1] 19
x <- 13
x
#> [1] 13
y
#> [1] 19

If you assign the value of object1 to object2, the value of object2 will not change if you subsequently change the value of object1.

I Need More Than Scalars!!

Data almost always come as more than one number. If you caught a lot of fish and recorded their lengths in mm it would be way-totally-super-uncool if you had to do this:

fish1 <- 121; fish2 <- 95; fish3 <- 87
fish4 <- 142  # and so on and on, so help me God!

But R has you covered here. R loves vectors! (one dimensional arrays), observe:

fish <- c(121, 95, 87, 142)
fish
#> [1] 121  95  87 142

A quick word on functions

  • In the previous slide we saw the c function, which is the first function we have seen that actually looks like a function (technically x+y is "+"(x,y), but let us not get into that now).
  • In R, functions typically take the form of:
    function_name( function_arguments )
    where function_name must follow the requirements for names of objects and the function_arguments could be lots of different things (objects or values), as long as they make sense for the function.
  • You pass arguments (which are typically other objects that have some values assigned to them) to a function and the function returns a result.

The c() function

In

fish <- c(121, 95, 87, 142)

The arguments to c are 121, 95, 87, and 142.

  • The arguments could be objects, of course:
fish1 <- 121; fish2 <- 95; fish3 <- 87; fish4 <- 142;
catfish <- c(fish1, fish2, fish3, fish4)
catfish
#> [1] 121  95  87 142
  • The c function simply catenates its arguments together and returns the result as a vector.
  • R treats everything as a vector: that [1] in its output tells us that the next thing it prints is the first element of a vector!

comments powered by Disqus