## Problems to be done for “Homework Set 2”

These are a selection of exercises on coercion, recycling, and indexing, including indexing with names. For each problem, evaluate all the code in the code chunk (highlight it and hit CMD-Enter (or cntrl-Enter on a PC)) and then have a look at each of the variables involved before writing your answer.

Make sure your document still knits successfully before submitting.

**Instructions For completing this homework can be found** HERE

### Homework Set 2, #1: “coerce-and-multiply”

```
# Joe R. Newbie is trying to compute the componentwise product of two
# vectors x and y, but is running into trouble. Here is what he has
# done so far:
x <- c(3, 9, 12, "16", 11.4)
y <- c(2, 15, 10, 7, 5)
# when he tries to multiply these he gets an error. Use an `as.` function
# to coerce x appropriately and then return the product of x and y.
submit_answer({
})
```

For the following, recall from this lecture how to test for missing data.

### Homework Set 2, #2: “do-stuff-with-NAs”

```
# z is a vector with some missing data values, and w is
# a vector of the same length with no missing data:
set.seed(5)
w <- sample(1:20, 10)
z <- sample(1:20, 10)
z[sample(1:length(z), 4)] <- NA
# return a vector that has all the non-NA values in z in the
# order in which they occur in z.
submit_answer({
# <- put your answer to the left of the #.
}, subprob = "-a")
# In the above, don't worry about the "subprob" argument. That is just
# part of the problem naming and numbering system.
# Another exercise: Return all the values in w that
# occur at the same position as the NAs in z.
submit_answer({
}, subprob = "-b")
# Another exercise: Return a vector which is like z, but in which all
# the non-missing values have been multiplied by 2.5 and all the missing
# values (NAs) have been turned into -1's
submit_answer({
}, subprob = "-c")
# Last subproblem: Modify z so that every NA gets replaced by the value
# in the same position in the vector w
submit_answer({
}, subprob = "-d")
```

## About Euclidean distance

If you have two vectors *p* = (*p*_{1}, …, *p*_{n}) and *q* = (*q*_{1}, …, *q*_{n}) that describe two points in an *n*-dimensional space, the Euclidean Distance between the points is defined as:

$$
d(p,q) = \biggl( \sum_{i=1}^n (p_i - q_i)^2 \biggr)^{\frac{1}{2}}
$$

The next problem asks you to compute Euclidean distance between two vectors.

### Homework Set 2, #3: “euclidean-distance”

```
# Let p and q be two vectors defining points in a 20-dimensional space:
set.seed(10)
p <- c(-1,1) * rnorm(20, mean=6, sd=2)
q <- c(-1,1) * rnorm(20, mean=6, sd=2)
# return the Euclidean distance between p and q. Note that if you are
# not familiar with the sum() function you should read about it in the
# help files by typing "?sum" at your R prompt.
submit_answer({
})
```

### Homework Set 2, #4: “bin-comp-combo”

```
# let a, b, and c be the following vectors:
set.seed(1)
a <- sample(letters, 100, replace = TRUE)
b <- rnorm(100)
c <- sample(1:1000, 100)
# return all the values in c that correspond to positions in
# the vectors where:
# values in a are between "g" and "m", inclusive, alphabetically
# AND
# values in b are less than -1.5 or greater than 1.0
# For checking, your result should have length 6.
submit_answer({
})
```

### Homework Set 2, #5: “indexing-and-recycling”

```
# f is capital letters of the alphabet
f <- LETTERS
# Index f with a logical vector (using recycling) to return every
# third element in f (i.e. elements 3, 6, 9,...)
submit_answer({
}, subprob = "-a")
# Use recycling with a logical vector
# to return every 3rd element in f, starting on element number 2 (i.e.
# get elements 2, 5, 8, ...)
submit_answer({
}, subprob = "-b")
# A new problem: Given the vector:
g <- 10:21
# Multiply every odd number in g by 2 and every even number
# in g by 3. Use recycling. Write as short an expression as
# possible
submit_answer({
}, subprob = "-c")
```

### Homework Set 2, #6: “using-names”

```
# here are some names of salmon populations in CA and OR:
pops <- c("Eel_R", "Russian_R", "Klamath_IGH_fa", "Trinity_H_sp", "Smith_R", "Chetco_R", "Cole_Rivers_H", "Applegate_Cr", "Coquille_R", "Umpqua_sp", "Siuslaw_R")
# each one of these populations belongs to a so-called
# "reporting-unit" which may include multiple populations.
# Here are the reporting units corrsponding to the populations in pops:
repunits <- c("CaliforniaCoast", "CaliforniaCoast", "KlamathR", "KlamathR", "NCaliforniaSOregonCoast", "NCaliforniaSOregonCoast", "RogueR", "RogueR", "MidOregonCoast", "MidOregonCoast", "MidOregonCoast")
# here are the populations-of-origin for 25 fish caught
# in a fishery off the coast of california:
set.seed(12)
fish_seq <- sample(pops, 25, replace = TRUE)
# Problem (a): Instead of knowing the sequence of salmon populations, some
# fishery managers want you to give them the sequence of *reporting units*.
# Return a vector of length 25 (same length as fish_seq) that gives the sequence of reporting units
# of the fish in fish_seq. Do this by setting the names attribute of
# repunits to be the pops and then indexing that vector with fish_seq.
submit_answer({
}, subprob = "-a")
# Now, 20 more fish were caught and their lengths measured in mm. Those
# lengths are recorded in fish_len, and the populations from which those
# fish came from are recorded in the names attribute of fish_len
set.seed(2)
fish_len <- floor(rnorm(20, mean = 700, sd = 90))
names(fish_len) <- sample(pops, 20, replace = TRUE)
# Problem (b): Create a new vector equal to fish_len, but give it
# names that are the reporting units corresponding to the
# fish_len populations. Call it fish_lr, and, after creating it
# return it.
submit_answer({
}, subprob = "-b")
# Problem (c): Extract the lengths of the 9 fish from the MidOregonCoast
# reporting unit. Don't do this by hand! Use a tidy expression (like indexing
# on the basis of a comparison of the names attribute of fish_lr)
submit_answer({
}, subprob = "-c")
# Bonus question: Why can't you get those 9 fish lengths by doing this: fish_len["MidOregonCoast"] ?
```

## Sorting in R

We are going to talk briefly about sorting in R. There are two main functions used for sorting: `sort`

and `order`

.

The `sort`

function returns a sorted version of its input vector. For example:

```
r <- c(4, 7, 1, 3, 12) # not sorted
sort(r) # returns all the elements of r in sorted order
#> [1] 1 3 4 7 12
```

This is useful when all you want to do is sort a single vector on the basis of its elements. However, much of the time when one is sorting data, you will be sorting one vector *on the basis of a different vector*. The `sort`

function is not useful for that. Instead you can use the `order`

function.

The `order`

function returns the indices which, if used to index its argument, would put it in sorted order. So, for example:

```
r <- c(4, 7, 1, 3, 12) # not sorted (same vector as above)
order(r) # indices that would extract elements from r in sorted order
#> [1] 3 4 1 2 5
# note that you can achieve the same things as sort(r) with
# r[order(r)]:
sort(r)
#> [1] 1 3 4 7 12
r[order(r)]
#> [1] 1 3 4 7 12
```

`order`

is considerably more versatile. We’ll do a quick problem on it.

### Homework Set 2, #7: “using-order”

```
# Imagine you have measured the weights (in kg) and lengths (in mm) of
# 20 fish and recorded them in the variables wt and len.
set.seed(3)
wt <- round(rnorm(20, mean = 15, sd = 3), digits = 1)
len <- wt * 53 + floor(rnorm(20, mean = 0, sd = 50))
# and let the population from which the fish arrive come be recorded in
# the variable wpop
wpop <- sample(c("Eel_R", "Russian_R", "Klamath_IGH_fa", "Trinity_H_sp", "Smith_R", "Chetco_R", "Cole_Rivers_H", "Applegate_Cr", "Coquille_R", "Umpqua_sp", "Siuslaw_R"), 20, replace = TRUE)
# Problem (a): Return the vector wt sorted alphabetically
# on the population that each fish came from.
submit_answer({
}, subprob = "-a")
# Problem (b): Return len sorted in DECREASING order of the
# weight of each fish. (do ?order to learn about sorting in increasing
# vs decreasing order.)
submit_answer({
}, subprob = "-b")
```

comments powered by Disqus