Reproducible Research Course by Eric C. Anderson for (NOAA/SWFSC)


Vectorization, Recycling, and Indexing in R

Many functions and almost all the operators (like + and *, etc.) are vectorized.

They operate very quickly on each element of an atomic vector.

Goals: we want to learn about:

  • Vectorization
  • Recycling
  • Indexing

These three ideas of fundamental to R.

We will also discuss:

  • Comparison operators
  • Logical operators
  • Mathematical operators

Binary Comparison Operators

These are “binary” because they involve two arguments.
Operate elementwise on vectors and return //logical vectors//

x < y    # less than
x > y    # greater than
x <= y   # less than or equal to 
x >= y   # greater than or equal to
x == y   # equal to 
x != y   # not equal to

== is the “comparison equals” which tests for equality. (Be careful not to use = which, in today’s versions of R, is actually interpreted as leftwards assignment.)

Binary Comparison Examples

  • With numeric vectors

    x <- c(1,2,5)
    y <- c(4,4,3)
    x == y
    #> [1] FALSE FALSE FALSE
    x != y
    #> [1] TRUE TRUE TRUE
    x < y
    #> [1]  TRUE  TRUE FALSE
  • With strings

    a <- c("izzy", "jazz", "tyler")
    b <- c("devon", "vanessa", "hilary")
    a < b   # alphabetical order
    #> [1] FALSE  TRUE FALSE
  • Here is a tricky combination of both. Can you parse it?

    (a < b) <= (x == y)  # trickier...notice the parentheses to force precedence
    #> [1]  TRUE FALSE  TRUE

Binary Comparison Between a Vector and a Scalar

Check this out:

x <- 1:10  # the colon operator returns a sequence

x
#>  [1]  1  2  3  4  5  6  7  8  9 10

x <= 3
#>  [1]  TRUE  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE

# compare this to:
x <= c(3,3,3,3,3,3,3,3,3,3)
#>  [1]  TRUE  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE

What is going on here?

Comparison With Different-Lengths of Vectors

Try this one:

x <- 1:10

x
#>  [1]  1  2  3  4  5  6  7  8  9 10

x > c(1,7)
#>  [1] FALSE FALSE  TRUE FALSE  TRUE FALSE  TRUE  TRUE  TRUE  TRUE

# compare this to:
x > c(1,7,  1,7,  1,7,  1,7,  1,7)
#>  [1] FALSE FALSE  TRUE FALSE  TRUE FALSE  TRUE  TRUE  TRUE  TRUE

To understand what is going on here we need to talk about recycling

Recycling of Vectors in R

A very super-wickedly, important, concept: R likes to operate on vectors of the same length, so if it encounters two vectors of different lengths in a binary operation, it merely replicates (recycles) the smaller vector until it is the same length as the longest vector, then it does the operation.

If the recycled smaller vector has to be “chopped off” to make it the length of the longer vector, you will get a warning, but it will still return a result:

x <- c(1,2,3)
y <- c(1,10)

x * y
#> Warning in x * y: longer object length is not a multiple of shorter object
#> length
#> [1]  1 20  3

We will see Recycling In Many contexts

Recycling occurs wherever two or more vectors get operated on elementwise, not just with comparison operators. It also happens (as we saw above) with mathematical operators. And it also happens with indexing operators when indexing by logical vectors (you’ll see that later)!!

You gotta know it! Here are some more examples:

x <- 1:20

x * c(1,0)  # turns the even numbers to 0
#>  [1]  1  0  3  0  5  0  7  0  9  0 11  0 13  0 15  0 17  0 19  0

x * c(0, 0, 1) # turns non-multiples of 3 to 0
#> Warning in x * c(0, 0, 1): longer object length is not a multiple of
#> shorter object length
#>  [1]  0  0  3  0  0  6  0  0  9  0  0 12  0  0 15  0  0 18  0  0

x < ((1:4)^2) # recycling c(1, 4, 9, 16)
#>  [1] FALSE  TRUE  TRUE  TRUE FALSE FALSE  TRUE  TRUE FALSE FALSE FALSE
#> [12]  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE

Combinations of Comparisons; Logical Ops

A Weather Example

  • Suppose two variables, temp (in degrees Celsius) and precip (in mm) each a vector of length 365.
  • Tell me how you test for:
    • All days with temp less than 10 and precip greater than 5
    • Days with temp greater than 15 or with no precip (or both)
    • Days with temp greater than 15 or with no precip (but not both)

Logical Operators-I

These operate on logicals and return logicals. numeric and complex vectors are coerced to logical before applying these.

  • Unary operators (those that operate elementwise on a single vector)
    • ! Turns TRUE to FALSE and FALSE to TRUE

      x <- c(T, T, F, F)  # you can use abbreviations for TRUE and FALSE...
      
      x
      #> [1]  TRUE  TRUE FALSE FALSE
      
      !x
      #> [1] FALSE FALSE  TRUE  TRUE

Logical Operators-II

  • Binary operators (operate elementwise on two vectors)
    • & — Logical AND
    • | — Logical OR
    • xor(x,y) — Logical EXCLUSIVE OR

      x <- c(NA, T, F, T, F)
      y <- c(T, T, F, F, NA)
      
      x
      #> [1]    NA  TRUE FALSE  TRUE FALSE
      
      y
      #> [1]  TRUE  TRUE FALSE FALSE    NA
      
      x & y 
      #> [1]    NA  TRUE FALSE FALSE FALSE
      
      x | y
      #> [1]  TRUE  TRUE FALSE  TRUE    NA
      
      xor(x,y)
      #> [1]    NA FALSE FALSE  TRUE    NA

Mathematical Operators

Operate on numeric or complex mode data and return the same

x + y   # addition
x - y   # subtraction
x * y   # multiplication
x / y   # division
x ^ y   # exponentiation
x %% y  # modulo division (remainder) 10 %% 3 = 1 
x %/% y # integer division: 10 %/% 3 = 3

Grouping Parts of Expressions

Parentheses are good for ensuring that parts of complex expressions are evaluated in the right order.

But, in case you want to appear like a real code jock and don’t want to use parentheses, learn the rules of precence.

Precedence of Operators we Have seen

From highest to lowest:

^                    # exponentiation (right to left)
- +                # unary minus and plus
:                    # sequence operator
* /                # multiply, divide
+ -                # (binary) add, subtract
< > <= >= == !=    # ordering and comparison
!                    # negation
&                  # and
|                  # or
->                 # rightwards assignment
=                    # assignment (right to left)
<-                 # assignment (right to left)

Higher precedence operators “stick” more tightly to their arguments. So, for example:

x<-3
y<-2

-x * y  # this is like (-x) * y
#> [1] -6

-x ^ y  # this is like -(x ^ y)
#> [1] -9

One very important precedence rule

Notice that the : has higher precedence than the +, -, *, or /.

Thus

1:5*3   # this is (1:5)*3
#> [1]  3  6  9 12 15

1:(5*3) # this is 1:15
#>  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15

Or, if you want the sequence of numbers from 0 to n-1, be careful:

n <- 5

0:n-1    # wrong
#> [1] -1  0  1  2  3  4

0:(n-1)  # right
#> [1] 0 1 2 3 4

Built In Help On Functions and Operators

Recall that ?function_name returns help (if available) for the function with name function_name:

# examples:
?c
?sum
?mean

Builtin help on topics we have discussed today can be found at ?Syntax, ?Logic, ?Comparison, ?Arithmetic.

Also, all material here is covered in parts of sections 1 through 3 in intro.pdf available on CRAN.

Indexing

There are times when we want to access one or just a few elements from a vector. We’ve already seen an example of extracting a single element, for example:

x <- c("devon", "alicia", "cassie")

x[2]  # this extracts the second element of x
#> [1] "alicia"

Vectors in R are base-1 subscripted. i.e. elements are subscripted “1, 2, 3, …” instead of “0, 1, 2, …”

Overview: 4 Ways To Extract from a Vector

Single square brackets are the indexing operators. There are four (common) ways of using the indexing operators. They differ by putting different things inside of the square brackets:

  1. A vector of positive indices: x[c(1,6,4)]
  2. A vector of negative indices: x[-c(1,6,4)
  3. A logical vector of the appropriate length: x[c(T,F,F,T,T)]
  4. A character vector of names: x[c("Sept10","Sept24")] if the vector has a names attribute.

Number four should not make sense to you yet!

Indexing With Positive Integers

  • A vector of positive integers extracts the corresponding elements, in the same order and as many times as the indices are listed in the vector

    x <- c(5,4,7,8)
    
    x[c(4,4,4,2,2,1,3,2)]  # returns a vector of length 8!
    #> [1] 8 8 8 4 4 5 7 4
  • If an index exceeds the length of the vector, it returns an NA for that element

    x <- c(5,4,7,8)
    
    x[c(4,1,3,5)]  # the 4th element of the returned vector is NA
    #> [1]  8  5  7 NA

    and gives no warning of this.

Indexing With Negative Integers

  • A vector of negative integers says, “extract everything except these indices.”
    • The order of the remaining elements is preserved.
    • Multiple instances of the same negative integer have the same effect as a single one
    • Negative integers exceeding the length of the vector are just ignored
x <- c(5,4,7,8) # here is our vector...

x[-2]
#> [1] 5 7 8

x[-c(2,4)]
#> [1] 5 7

x[-c(2,2,2,2,4,4,4,4)]
#> [1] 5 7

x[-c(2,4,5,10,18)]
#> [1] 5 7

You cannot mix positive and negative indices!

Indexing with Logical Vectors

  • You can supply a logical vector that is “parallel” to the vector you want to extract from. Any element where a TRUE occurs in the index vector gets returned. Order of elements is preserved and elements can’t get replicated.
x <- c(5,4,7,8)

x[c(FALSE, TRUE, TRUE, FALSE)]
#> [1] 4 7
  • If the index vector is shorter than the vector being indexed, the index vector is recycled
x <- c(5,4,7,8)

x[c(FALSE, TRUE)]
#> [1] 4 8

Empty Subscript Indexing

  • Here is a quirky feature that you should get to know well, as it will help to understand matrix and data.frame subscripting.
  • If you apply an empty indexing operator [] to a vector, then it returns everything in the vector. Observe:
x <- c(5,4,7,8)

x[]
#> [1] 5 4 7 8

x
#> [1] 5 4 7 8
  • “When you give R nothing it gives you everything in return!”

The Replacement form of Indexing

  • Also called the assignment form. Allows you to change specified elements of a vector while leaving the others untouched (except for mode changes due to coercion!)
  • This usually takes some getting used to, but you will use it all over in R. So get comfortable with it!

    x <- c(5,4,7,8)
    y <- x
    
    x[c(1,3)] <- 0  
    x
    #> [1] 0 4 0 8
    
    x <- y
    x[c(T,F,T,F)] <- 1
    x
    #> [1] 1 4 1 8
    
    x <- y
    x[-c(1,3)] <- NA
    x
    #> [1]  5 NA  7 NA
    
    x <- y
    x[c(1,3)] <- c("a","c") # coercion of remaining elements
    x
    #> [1] "a" "4" "c" "8"
    
    x <- y
    x[c(3,1,2)] <- c("boing1", "boing2", "boing3") # note ordering
    x
    #> [1] "boing2" "boing3" "boing1" "8"
    
    x <- y
    x[c(3,1,3,2,2,2)] <- c("boing1", "boing2", "boing3") # repeated occurrences ignored 
    x
    #> [1] "boing2" "boing3" "boing3" "8"

    The vector that is being assigned gets recycled as need be to match the length of the (extracted part of the) vector being indexed and assigned to.

Assignment Beyond the length of the Vector

  • This is allowable when using the replacement form. Intermediate elements are set to NA

    x <- c(5,4,7,8)
    
    length(x)
    #> [1] 4
    
    x[10] <- 12
    
    x
    #>  [1]  5  4  7  8 NA NA NA NA NA 12
    
    length(x)
    #> [1] 10
  • Those NA’s don’t get overwritten by recycling. Recycling only occurs to match the length of the vector returned by the indexing operation:

    x <- c(5,4,7,8)
    
    x[11:19] <- c(-1,0,1)

comments powered by Disqus