Vectorization, Recycling, and Indexing in R
Many functions and almost all the operators (like +
and *
, etc.) are vectorized.
They operate very quickly on each element of an atomic vector.
Goals: we want to learn about:
- Vectorization
- Recycling
- Indexing
These three ideas of fundamental to R.
We will also discuss:
- Comparison operators
- Logical operators
- Mathematical operators
Binary Comparison Operators
These are “binary” because they involve two arguments.
Operate elementwise on vectors and return //logical vectors//
x < y # less than
x > y # greater than
x <= y # less than or equal to
x >= y # greater than or equal to
x == y # equal to
x != y # not equal to
==
is the “comparison equals” which tests for equality. (Be careful not to use =
which, in today’s versions of R, is actually interpreted as leftwards assignment.)
Binary Comparison Examples
With numeric vectors
x <- c(1,2,5) y <- c(4,4,3) x == y #> [1] FALSE FALSE FALSE x != y #> [1] TRUE TRUE TRUE x < y #> [1] TRUE TRUE FALSE
With strings
a <- c("izzy", "jazz", "tyler") b <- c("devon", "vanessa", "hilary") a < b # alphabetical order #> [1] FALSE TRUE FALSE
Here is a tricky combination of both. Can you parse it?
(a < b) <= (x == y) # trickier...notice the parentheses to force precedence #> [1] TRUE FALSE TRUE
Binary Comparison Between a Vector and a Scalar
Check this out:
x <- 1:10 # the colon operator returns a sequence
x
#> [1] 1 2 3 4 5 6 7 8 9 10
x <= 3
#> [1] TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
# compare this to:
x <= c(3,3,3,3,3,3,3,3,3,3)
#> [1] TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
What is going on here?
Comparison With Different-Lengths of Vectors
Try this one:
x <- 1:10
x
#> [1] 1 2 3 4 5 6 7 8 9 10
x > c(1,7)
#> [1] FALSE FALSE TRUE FALSE TRUE FALSE TRUE TRUE TRUE TRUE
# compare this to:
x > c(1,7, 1,7, 1,7, 1,7, 1,7)
#> [1] FALSE FALSE TRUE FALSE TRUE FALSE TRUE TRUE TRUE TRUE
To understand what is going on here we need to talk about recycling
Recycling of Vectors in R
A very super-wickedly, important, concept: R likes to operate on vectors of the same length, so if it encounters two vectors of different lengths in a binary operation, it merely replicates (recycles) the smaller vector until it is the same length as the longest vector, then it does the operation.
If the recycled smaller vector has to be “chopped off” to make it the length of the longer vector, you will get a warning, but it will still return a result:
x <- c(1,2,3)
y <- c(1,10)
x * y
#> Warning in x * y: longer object length is not a multiple of shorter object
#> length
#> [1] 1 20 3
We will see Recycling In Many contexts
Recycling occurs wherever two or more vectors get operated on elementwise, not just with comparison operators. It also happens (as we saw above) with mathematical operators. And it also happens with indexing operators when indexing by logical vectors (you’ll see that later)!!
You gotta know it! Here are some more examples:
x <- 1:20
x * c(1,0) # turns the even numbers to 0
#> [1] 1 0 3 0 5 0 7 0 9 0 11 0 13 0 15 0 17 0 19 0
x * c(0, 0, 1) # turns non-multiples of 3 to 0
#> Warning in x * c(0, 0, 1): longer object length is not a multiple of
#> shorter object length
#> [1] 0 0 3 0 0 6 0 0 9 0 0 12 0 0 15 0 0 18 0 0
x < ((1:4)^2) # recycling c(1, 4, 9, 16)
#> [1] FALSE TRUE TRUE TRUE FALSE FALSE TRUE TRUE FALSE FALSE FALSE
#> [12] TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
Combinations of Comparisons; Logical Ops
A Weather Example
- Suppose two variables,
temp
(in degrees Celsius) andprecip
(in mm) each a vector of length 365. - Tell me how you test for:
- All days with temp less than 10 and precip greater than 5
- Days with temp greater than 15 or with no precip (or both)
- Days with temp greater than 15 or with no precip (but not both)
Logical Operators-I
These operate on logical
s and return logical
s. numeric
and complex
vectors are coerced to logical
before applying these.
- Unary operators (those that operate elementwise on a single vector)
!
TurnsTRUE
toFALSE
andFALSE
toTRUE
x <- c(T, T, F, F) # you can use abbreviations for TRUE and FALSE... x #> [1] TRUE TRUE FALSE FALSE !x #> [1] FALSE FALSE TRUE TRUE
Logical Operators-II
- Binary operators (operate elementwise on two vectors)
&
— Logical AND|
— Logical ORxor(x,y)
— Logical EXCLUSIVE ORx <- c(NA, T, F, T, F) y <- c(T, T, F, F, NA) x #> [1] NA TRUE FALSE TRUE FALSE y #> [1] TRUE TRUE FALSE FALSE NA x & y #> [1] NA TRUE FALSE FALSE FALSE x | y #> [1] TRUE TRUE FALSE TRUE NA xor(x,y) #> [1] NA FALSE FALSE TRUE NA
Mathematical Operators
Operate on numeric
or complex
mode data and return the same
x + y # addition
x - y # subtraction
x * y # multiplication
x / y # division
x ^ y # exponentiation
x %% y # modulo division (remainder) 10 %% 3 = 1
x %/% y # integer division: 10 %/% 3 = 3
Grouping Parts of Expressions
Parentheses are good for ensuring that parts of complex expressions are evaluated in the right order.
But, in case you want to appear like a real code jock and don’t want to use parentheses, learn the rules of precence.
Precedence of Operators we Have seen
From highest to lowest:
^ # exponentiation (right to left)
- + # unary minus and plus
: # sequence operator
* / # multiply, divide
+ - # (binary) add, subtract
< > <= >= == != # ordering and comparison
! # negation
& # and
| # or
-> # rightwards assignment
= # assignment (right to left)
<- # assignment (right to left)
Higher precedence operators “stick” more tightly to their arguments. So, for example:
x<-3
y<-2
-x * y # this is like (-x) * y
#> [1] -6
-x ^ y # this is like -(x ^ y)
#> [1] -9
One very important precedence rule
Notice that the :
has higher precedence than the +
, -
, *
, or /
.
Thus
1:5*3 # this is (1:5)*3
#> [1] 3 6 9 12 15
1:(5*3) # this is 1:15
#> [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Or, if you want the sequence of numbers from 0 to n-1, be careful:
n <- 5
0:n-1 # wrong
#> [1] -1 0 1 2 3 4
0:(n-1) # right
#> [1] 0 1 2 3 4
Built In Help On Functions and Operators
Recall that ?function_name
returns help (if available) for the function with name function_name
:
# examples:
?c
?sum
?mean
Builtin help on topics we have discussed today can be found at ?Syntax
, ?Logic
, ?Comparison
, ?Arithmetic
.
Also, all material here is covered in parts of sections 1 through 3 in intro.pdf available on CRAN.
Indexing
There are times when we want to access one or just a few elements from a vector. We’ve already seen an example of extracting a single element, for example:
x <- c("devon", "alicia", "cassie")
x[2] # this extracts the second element of x
#> [1] "alicia"
Vectors in R are base-1 subscripted. i.e. elements are subscripted “1, 2, 3, …” instead of “0, 1, 2, …”
Overview: 4 Ways To Extract from a Vector
Single square brackets are the indexing operators. There are four (common) ways of using the indexing operators. They differ by putting different things inside of the square brackets:
- A vector of positive indices:
x[c(1,6,4)]
- A vector of negative indices:
x[-c(1,6,4)
- A logical vector of the appropriate length:
x[c(T,F,F,T,T)]
- A character vector of names:
x[c("Sept10","Sept24")]
if the vector has anames
attribute.
Number four should not make sense to you yet!
Indexing With Positive Integers
A vector of positive integers extracts the corresponding elements, in the same order and as many times as the indices are listed in the vector
x <- c(5,4,7,8) x[c(4,4,4,2,2,1,3,2)] # returns a vector of length 8! #> [1] 8 8 8 4 4 5 7 4
If an index exceeds the length of the vector, it returns an
NA
for that elementx <- c(5,4,7,8) x[c(4,1,3,5)] # the 4th element of the returned vector is NA #> [1] 8 5 7 NA
and gives no warning of this.
Indexing With Negative Integers
- A vector of negative integers says, “extract everything except these indices.”
- The order of the remaining elements is preserved.
- Multiple instances of the same negative integer have the same effect as a single one
- Negative integers exceeding the length of the vector are just ignored
- The order of the remaining elements is preserved.
x <- c(5,4,7,8) # here is our vector...
x[-2]
#> [1] 5 7 8
x[-c(2,4)]
#> [1] 5 7
x[-c(2,2,2,2,4,4,4,4)]
#> [1] 5 7
x[-c(2,4,5,10,18)]
#> [1] 5 7
You cannot mix positive and negative indices!
Indexing with Logical Vectors
- You can supply a logical vector that is “parallel” to the vector you want to extract from. Any element where a
TRUE
occurs in the index vector gets returned. Order of elements is preserved and elements can’t get replicated.
x <- c(5,4,7,8)
x[c(FALSE, TRUE, TRUE, FALSE)]
#> [1] 4 7
- If the index vector is shorter than the vector being indexed, the index vector is recycled
x <- c(5,4,7,8)
x[c(FALSE, TRUE)]
#> [1] 4 8
Empty Subscript Indexing
- Here is a quirky feature that you should get to know well, as it will help to understand matrix and data.frame subscripting.
- If you apply an empty indexing operator
[]
to a vector, then it returns everything in the vector. Observe:
x <- c(5,4,7,8)
x[]
#> [1] 5 4 7 8
x
#> [1] 5 4 7 8
- “When you give R nothing it gives you everything in return!”
The Replacement form of Indexing
- Also called the assignment form. Allows you to change specified elements of a vector while leaving the others untouched (except for mode changes due to coercion!)
This usually takes some getting used to, but you will use it all over in R. So get comfortable with it!
x <- c(5,4,7,8) y <- x x[c(1,3)] <- 0 x #> [1] 0 4 0 8 x <- y x[c(T,F,T,F)] <- 1 x #> [1] 1 4 1 8 x <- y x[-c(1,3)] <- NA x #> [1] 5 NA 7 NA x <- y x[c(1,3)] <- c("a","c") # coercion of remaining elements x #> [1] "a" "4" "c" "8" x <- y x[c(3,1,2)] <- c("boing1", "boing2", "boing3") # note ordering x #> [1] "boing2" "boing3" "boing1" "8" x <- y x[c(3,1,3,2,2,2)] <- c("boing1", "boing2", "boing3") # repeated occurrences ignored x #> [1] "boing2" "boing3" "boing3" "8"
The vector that is being assigned gets recycled as need be to match the length of the (extracted part of the) vector being indexed and assigned to.
Assignment Beyond the length of the Vector
This is allowable when using the replacement form. Intermediate elements are set to NA
x <- c(5,4,7,8) length(x) #> [1] 4 x[10] <- 12 x #> [1] 5 4 7 8 NA NA NA NA NA 12 length(x) #> [1] 10
Those NA’s don’t get overwritten by recycling. Recycling only occurs to match the length of the vector returned by the indexing operation:
x <- c(5,4,7,8) x[11:19] <- c(-1,0,1)
comments powered by Disqus