Reproducible Research Course by Eric C. Anderson for (NOAA/SWFSC)


Object (vector) attributes. length, names.

Object Attributes

  • Before we can talk about indexing with names we have to talk about the fact that any object can possess attributes.
  • Every R object has two intrinsic attributes: mode and length
  • For atomic vectors, possible modes are: (fill in the blanks)
  • The functions mode(x) and length(x) return these attributes for x

The mode() function

set.seed(6)
x <- sample(x = letters, size = 10)

# a vector of letters:
x
#>  [1] "p" "x" "g" "i" "r" "u" "t" "o" "j" "b"
# what mode is it?
mode(x)
#> [1] "character"

# a vector of complex numbers
y <- 1:10 + 0+3i
y
#>  [1]  1+3i  2+3i  3+3i  4+3i  5+3i  6+3i  7+3i  8+3i  9+3i 10+3i

# what mode is it?
mode(y)
#> [1] "complex"

The length() function

  • You are more likely to use length() than the mode() function. In fact, you will use it all the time.
  • As you might guess, it returns the length of an object (like a vector) as an integer.

    x <- seq(1, 5, by=.67)
    x
    #> [1] 1.00 1.67 2.34 3.01 3.68 4.35
    
    # how long is that?
    length(x)
    #> [1] 6
    
    # how can we pick out just the last element?
    x[length(x)]
    #> [1] 4.35
    
    # how can we return a vector of everything but the last element?
    x[-length(x)]
    #> [1] 1.00 1.67 2.34 3.01 3.68

The replacement form of the length() function

  • Check out this little bit of syntactic sugar: To change the length of an object called obj, say, you can do like this:
length(obj) <- 16
  • It will chop off the end if the new value is smaller than the old value
  • It will pad the end with NAs if the the new value is larger than the old value

    x <- seq(6.5, 8.8, length.out = 15)
    x
    #>  [1] 6.500000 6.664286 6.828571 6.992857 7.157143 7.321429 7.485714
    #>  [8] 7.650000 7.814286 7.978571 8.142857 8.307143 8.471429 8.635714
    #> [15] 8.800000
    
    # make it longer
    length(x) <- 25    
    x
    #>  [1] 6.500000 6.664286 6.828571 6.992857 7.157143 7.321429 7.485714
    #>  [8] 7.650000 7.814286 7.978571 8.142857 8.307143 8.471429 8.635714
    #> [15] 8.800000       NA       NA       NA       NA       NA       NA
    #> [22]       NA       NA       NA       NA
    
    # now, chop it off if you want:
    length(x) <- 7

A few problems for thought

Length and names lecture, #1: “middle-extracto”

# here is a vector:
y <- seq(pi, 15*pi, by=pi)

# give me all the elements from the 4th up to the 3rd from the last:

Length and names lecture, #2: “gimme-n-of-something”

set.seed(10)
# here are a certain number of letters:
alpha <- sample(letters, size = as.integer(runif(1, min=4, max=20)))

# simulate a random normal deviate for each one (using rnorm)

The names Attribute of a Vector

  • R gives you the option of having a name for every element of a vector
  • You can set the names attribute of a vector with the replacement form of the names() function.
  • You can query the names attribute with the names() function (it returns a character vector).
  • You can index a vector that has a names attribute with names!

Setting the names of a vector

# here's a vector
x <- c(5,4,7,8)

# here we set its names to whatever we want
names(x) <- c("first", "second", "third", "boing")

# when we (auto)print the vector, the names are included above the value:
x
#>  first second  third  boing 
#>      5      4      7      8

Reading names in vector output

  • This can take some getting used to. For the first 10 years I worked with R I always got confused about what was a name and what was a value in the output.
  • And I thought it was friggin’ ugly to have all those names on there sometime
  • And it “can” strain the eyeballs if the names are long. Consider this ridiculous example:

    # values = the first 17 values of the alphabet
    ab <- letters[1:17]
    ab  # not so hard to look at
    #>  [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q"
    
    # the sha1 hashes for each of the 17 letters
    library(digest)
    hashes <- unname(sapply(ab, digest, algo = "sha1"))
    hashes  # sort of ugly to look at
    #>  [1] "1f9928593251410322823fefea8c3ef79b4d0254"
    #>  [2] "ee6e7fdb03a0d35b3a6f499d0f8f610686551d51"
    #>  [3] "8e7f9fe32c49050c5ca146150fc58b93fbeea245"
    #>  [4] "e59165f73b7dc7e0d6ae94ec9aac9e8e95fd8a2c"
    #>  [5] "7f608bde8f0e308aa8866d737ddebbfae9674163"
    #>  [6] "86e99e22d003547538a5f446165488f7861fa2c3"
    #>  [7] "ce27dce0e84ad90d3e90e9b571a73720d0fb4890"
    #>  [8] "221799200137b7d72dfc4a618465bec71333a58b"
    #>  [9] "13b5c7533cccc95d2f7cd18df78ea78ed9111c02"
    #> [10] "88b7c7c5f6921ec9e914488067552829a17a42a4"
    #> [11] "6127e4cdbf02f18898554c037f0d4acb95c608ab"
    #> [12] "984ca0fd9ed47ac08a31aeb88f9c9a5f905aeaa2"
    #> [13] "954da0ea9a5d0aa42516beebc5542c638161934c"
    #> [14] "7d1e34387808d9f726efbb1c8eb0819a115afb52"
    #> [15] "2e21764867596d832896d9d28d6e6489a0b27249"
    #> [16] "666881f1f74c498e0292ccf3d9d26089ee79dae7"
    #> [17] "966dbbe6cf1c43ac784a8257b57896db9fd3f357"
    
    # name each element in ab with its hash
    names(ab) <- hashes
    ab  # print it out now...Really hard to look at
    #> 1f9928593251410322823fefea8c3ef79b4d0254 
    #>                                      "a" 
    #> ee6e7fdb03a0d35b3a6f499d0f8f610686551d51 
    #>                                      "b" 
    #> 8e7f9fe32c49050c5ca146150fc58b93fbeea245 
    #>                                      "c" 
    #> e59165f73b7dc7e0d6ae94ec9aac9e8e95fd8a2c 
    #>                                      "d" 
    #> 7f608bde8f0e308aa8866d737ddebbfae9674163 
    #>                                      "e" 
    #> 86e99e22d003547538a5f446165488f7861fa2c3 
    #>                                      "f" 
    #> ce27dce0e84ad90d3e90e9b571a73720d0fb4890 
    #>                                      "g" 
    #> 221799200137b7d72dfc4a618465bec71333a58b 
    #>                                      "h" 
    #> 13b5c7533cccc95d2f7cd18df78ea78ed9111c02 
    #>                                      "i" 
    #> 88b7c7c5f6921ec9e914488067552829a17a42a4 
    #>                                      "j" 
    #> 6127e4cdbf02f18898554c037f0d4acb95c608ab 
    #>                                      "k" 
    #> 984ca0fd9ed47ac08a31aeb88f9c9a5f905aeaa2 
    #>                                      "l" 
    #> 954da0ea9a5d0aa42516beebc5542c638161934c 
    #>                                      "m" 
    #> 7d1e34387808d9f726efbb1c8eb0819a115afb52 
    #>                                      "n" 
    #> 2e21764867596d832896d9d28d6e6489a0b27249 
    #>                                      "o" 
    #> 666881f1f74c498e0292ccf3d9d26089ee79dae7 
    #>                                      "p" 
    #> 966dbbe6cf1c43ac784a8257b57896db9fd3f357 
    #>                                      "q"
    
    # but note that we can index with the names (as strings)
    ab["221799200137b7d72dfc4a618465bec71333a58b"]
    #> 221799200137b7d72dfc4a618465bec71333a58b 
    #>                                      "h"
    
    # in the above, output, what is the value and what is the name?

Get rid of those damn names so I can read the thing

Sometimes you just want to get rid of the names to read stuff, or you might have another legitimate reason to do so. A handy way to do this is with the unname() function

ab  # whoa horribly ugly named output
#> 1f9928593251410322823fefea8c3ef79b4d0254 
#>                                      "a" 
#> ee6e7fdb03a0d35b3a6f499d0f8f610686551d51 
#>                                      "b" 
#> 8e7f9fe32c49050c5ca146150fc58b93fbeea245 
#>                                      "c" 
#> e59165f73b7dc7e0d6ae94ec9aac9e8e95fd8a2c 
#>                                      "d" 
#> 7f608bde8f0e308aa8866d737ddebbfae9674163 
#>                                      "e" 
#> 86e99e22d003547538a5f446165488f7861fa2c3 
#>                                      "f" 
#> ce27dce0e84ad90d3e90e9b571a73720d0fb4890 
#>                                      "g" 
#> 221799200137b7d72dfc4a618465bec71333a58b 
#>                                      "h" 
#> 13b5c7533cccc95d2f7cd18df78ea78ed9111c02 
#>                                      "i" 
#> 88b7c7c5f6921ec9e914488067552829a17a42a4 
#>                                      "j" 
#> 6127e4cdbf02f18898554c037f0d4acb95c608ab 
#>                                      "k" 
#> 984ca0fd9ed47ac08a31aeb88f9c9a5f905aeaa2 
#>                                      "l" 
#> 954da0ea9a5d0aa42516beebc5542c638161934c 
#>                                      "m" 
#> 7d1e34387808d9f726efbb1c8eb0819a115afb52 
#>                                      "n" 
#> 2e21764867596d832896d9d28d6e6489a0b27249 
#>                                      "o" 
#> 666881f1f74c498e0292ccf3d9d26089ee79dae7 
#>                                      "p" 
#> 966dbbe6cf1c43ac784a8257b57896db9fd3f357 
#>                                      "q"

unname(ab)  # returns its argument, but with the names attribute stripped off
#>  [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q"
x[c("boing", "second")] # note names are retained in result 
#>  boing second 
#>      8      4

x[c("third", "boing", "oops", "first", "first")] # note NA in result
#> third boing  <NA> first first 
#>     7     8    NA     5     5

Return the names as a vector

This is super important:

The names attribute of a vector is just a vector itself of mode character.

x <- c(5,4,7,8)  # here is a vector

# here we give it a names attribute, thus providing a name
# for every element in it
names(x) <- c("first", "second", "third", "boing")
x # see it printed with the names
#>  first second  third  boing 
#>      5      4      7      8

# now, we can do this to get the names back as a vector
names(x)
#> [1] "first"  "second" "third"  "boing"

# super important!  We can even index it directly.
# for example: get all the names for which the value
# of x is greater than 6:
names(x)[x>6]
#> [1] "third" "boing"

You can index with names!

Though it might not seem, at first, to be super useful, this is incredibly useful. You can do something like this x[c("boing", "second")] to extract elements from a named vector.

# same setup as before:
x <- c(5,4,7,8) 
names(x) <- c("first", "second", "third", "boing")

# behold the power!
x[c("boing", "first")]
#> boing first 
#>     8     5

Assignment Form names Indexing

When you use the assignment form of the indexing operator, and you include a name that doesn’t exist, it expands the vector beyond its current length

x <- c(5,4,7,8)

names(x) <- c("first", "second", "third", "boing")
x
#>  first second  third  boing 
#>      5      4      7      8

x[c("first", "third", "oofdah", "squawk")] <- c(-1,-2,-3,-4)
x
#>  first second  third  boing oofdah squawk 
#>     -1      4     -2      8     -3     -4

Using names indexing as an associative array

The power of names indexing really comes through when you want to return a value for every unique name or identifier. This sort of construction occurs all the time.

Here is a contrived example:

  • You are studying fish behavior and you have 16 fish that you have labelled A through P, and you have tagged them so you can tell who they are when you see them.

    IDs <- LETTERS[1:16]
  • You have recorded the fork length of each fish to the nearest mm

    set.seed(16)
    fklen <- floor(rnorm(length(IDs), mean = 150, sd = 15))
  • You also have watched them all day and you have recorded the order in which they have been jumping all day long. There have been 487 fish jumps recorded today and you have recorded the sequence:

    sequence <- sample(IDs, 487, replace = TRUE)

Now, what you are really interested in is whether big fish are more likely to jump after big fish than small fish, so you really need a vector which gives the sequence of the fork lengths of the fish that were jumping. Can you see how to do that using names? Go for it.

Other non-intrinsic Attributes (can be skipped for now…)

  • Any object in R can have a number of different attributes these are not the data contained in the object, but they may affect how the object is treated.
  • Attributes are fairly central to the operation of R.
  • Relevant functions:

    attributes(x)  # list all non-intrinsic attributes of x
    
    attributes(x) <- value # set all attributes of x  (seldom used)
    attr(x, "boing") # return value of x's "boing" attribute
    attr(x, "boing") <- value # set x's "boing" attribute to value

    Common attributes accessed via various convenience functions


comments powered by Disqus