# Atomic Data Types and Coercion

## Basic Data “Modes” of R

There are four main “modes” of scalar data, in order from least to most general:

`logical`

can take two values:`TRUE`

and`FALSE`

, which can be abbreviated, when you type them as`T`

and`F`

.- The
`numeric`

mode comes in two flavors: “integer” and “numeric” (real numbers). Examples:`1`

,`3.14`

,`8.2`

,`10`

, etc. `complex`

: these are complex numbers of the form*a*+*b**i*where*a*and*b*are real numbers and $i=\sqrt{-1}.$ Examples:`3.2+7.3i`

,`4+0i`

`character`

: these take values that are often called “strings” in other languages. Examples:`"fred"`

,`"foo"`

,`"bar"`

,`"boing"`

. There is also a`raw`

mode which refers to raw bytes of data, but we won’t concern ourselves with that for now.

### Atomic Vectors

A fundamental data structure in R: a vector in which every element is of the same mode. Like

```
x <- c(1,2,3,5,7)
x
#> [1] 1 2 3 5 7
```

Pretty basic stuff, until you start accidentally, or intentionally mixing modes.

```
x <- c(1,2,3,5,7,"11")
x
#> [1] "1" "2" "3" "5" "7" "11"
```

The mode of everything is *coerced* to the mode of the element with the most general mode, and this can really bite you in the rear if you don’t watch out!

## Coercion

- All the data in an atomic vector
*must be of the same mode* - If data are added so that modes are mixed, then
*the whole vector gets changed so that everything is of the most general mode* Example:

`# simple atomic vector of mode numeric x <- 1:6 x #> [1] 1 2 3 4 5 6 # now change one to mode character and see what happens x[1] <- "tweezer" x #> [1] "tweezer" "2" "3" "4" "5" "6"`

### Coercion Up One Step

- logical to numeric:
`TRUE`

==>`1`

`FALSE`

==>`0`

- numeric to complex:
`6.4`

==>`6.4+0i`

`5`

==>`5+0i`

- complex to character:
`6.4+0i`

==>`"6.4+0i"`

### Coercion Up Two Or More Steps

Note that the coercion sometimes “jumps over the intermediate steps”

- logical to complex
`TRUE`

==>`1+0i`

`FALSE`

==>`0+0i`

- logical to character (it
*does not*go FALSE ==> 0 ==> “0”)`TRUE`

==>`"TRUE"`

`FALSE`

==>`"FALSE"`

- numeric to character
`7`

==>`"7"`

`3.1415`

==>`"3.1415"`

### Coercion down one step

Sometimes things get coerced “downards” (i.e., toward less general data types).

If the coercion doesn’t make sense you end up with `NA`

which is how R denotes missing data

- numeric to logical (0 ==> FALSE, anything else ==> TRUE);
*Always “makes sense”*`0`

==>`FALSE`

`1`

==>`TRUE`

`78.2`

==>`TRUE`

`0.0001`

==>`TRUE`

`-563.3`

==>`TRUE`

- complex to numeric (discards complex part and warns about it!)
`3.4+0i`

==>`3.4`

`5.6+7.6i`

==>`5.6`

(+ a warning)`# witness a warning: as.numeric(7.4+5i) #> Warning: imaginary parts discarded in coercion #> [1] 7.4`

- character to complex
`"3.4+4i"`

==>`3.4+4i`

`"a"`

->`NA`

(you can’t coerce`"a"`

to any number, reasonably)

### Coercion down more than one step

Important point: it doesn’t *necessarily* go through intermediate steps:

- complex to logical (0 ==>FALSE, anything else ==> TRUE)
`0+0i`

==>`FALSE`

`0+2i`

==>`TRUE`

`5+0i`

==>`TRUE`

`5+9i`

==>`TRUE`

- character to logical
`"TRUE"`

==>`TRUE`

`"FALSE"`

==>`FALSE`

`"1"`

==>`NA`

(*yikes! if it went through numeric you’d get something different!*)`"0"`

==>`NA`

- character to numeric
`"56.764"`

==>`56.764`

`"4+8i"`

==>`4`

(with a warning that the complex part was dropped)`"fred"`

->`NA`

### Functions For Explicit Coercion

There is a whole family for coercing objects between different modes (or different types) that take the form `as.something`

:

`as.logical(x)`

`as.numeric(x)`

`as.integer(x)`

# not a mode, (this is a subclass of the`numeric`

mode)`as.complex(x)`

`as.character(x)`

As expected, these are vectorized—they coerce every element of the vector to the desired mode.

## Missing Data and Special Values in R

We saw `NA`

up above. That means “Not Available” and it denotes missing data.

There are also two more interesting values:

`Inf`

(-Inf) means ∞ (or − ∞) and arises from things like: 1/0 or log(0).`NaN`

means “Not a Number” and it arises from situations where you can’t evaluate something and it doesn’t have an obvious limit. Like 0/0 or Inf/-Inf or 0*Inf.

- If you wish to test whether something is NaN, or NA you have:
`is.na(x)`

and`is.nan(x)`

which return logical vectors. The same goes for testing if things are finite or infinite:

`x <- c(NA, 2, Inf, 4, NaN, 6) is.nan(x) # only the NaN #> [1] FALSE FALSE FALSE FALSE TRUE FALSE is.na(x) # both NA and NaN #> [1] TRUE FALSE FALSE FALSE TRUE FALSE is.infinite(x) # only Inf or -Inf #> [1] FALSE FALSE TRUE FALSE FALSE FALSE`

### Modes of Missing Data

Here is something to be aware of: missing values, like non-missing values, carry around their mode. Try this:

```
x <- c(1, 2, NA, 4, "5")
x
#> [1] "1" "2" NA "4" "5"
x[3] # this extracts the third element of x
#> [1] NA
c(10,20,30,x[3])
#> [1] "10" "20" "30" NA
c(10, 20, 30, NA) # this is a "fresh" NA, no coercion
#> [1] 10 20 30 NA
```

## Vectorization

- In R, the term
*vectorization*refers to the fact that, in many cases, when you apply a function to a vector, it applies the function to every element of the vector. - This is apparent in many of the
*operators*and we will see it in plenty of other functions, too.

### Most Operators are Vectorized

This is *incredibly important*! All the mathematical operators, like `+`

, `-`

, `*`

, and the logical operators, like `&`

(AND), `|`

(OR), and the comparison operators, like `<`

and `>`

are hungry to operate *element-wise* on every *element* of a vector. Example:

```
fish.lengths <- c(121, 95, 87, 142)
fish.weights <- c(1011, 505, 702, 900)
fish.fatness <- fish.weights / fish.lengths
fish.fatness
#> [1] 8.355372 5.315789 8.068966 6.338028
```

### Vectorization is so important…

That we are going to go to open up a whole new lecture that starts with it.

comments powered by Disqus