Functions and lapply
Intro
R is known as a “functional” language in the sense that every operation it does can be be thought of a function that operates on arguments and returns a value. For the casual user of R, it is not clear whether thinking about this is helpful. BUT what is helpful to any user of R is the ability to understand how functions in R:
- Are called,
- Parse their arguments,
- Can be defined by the user (yes! you can make your own functions in R),
- Can be applied iteratively over elements of lists or vectors.
Once you get comfortable writing your own functions, you can save yourself a lot of time by:
- recognizing what parts of your code essesntially do the same things
- wrapping those into a function
- calling that function with different arguments.
This can be particularly useful if you want to apply the same analysis to multiple different data sets. You can write a function that will perform the analysis on a single, “generic” data set like one that you have, and then you can apply that function to multiple data sets that you might have.
Specific goals
- Show how you define functions
- Discuss parameters and arguments, and R’s system for default values and parsing of argument lists.
- Take a brief sojourn into the world of overloaded functions and R’s S3 object system.
- Show how you can apply a function to every member of a list with
lapply()
, and give an actual example.
User defined functions
We could start off talking about functions, generally, but it will be more fun, and more accessible to just start writing our own functions. In the process we will learn a lot about function conventions.
the function function
- There is a function in R called
function()
whose job is to return a function. - Let’s use it, and then talk about it.
- Here we make a function called evensum that adds up the elements in positions 2, 4, 6, … of a vector:
# define it
evensum <- function(x) sum(x[seq(2, length(x), by = 2)])
# use it!
evensum(x = 10:20)
#> [1] 75
- Wow! that was easy! How did that work?
The anatomy of function()
- The syntax of the
function()
function is relatively straightforward:- It takes arguments which are the names of the variables that act as placeholders for the values that you will pass into the function. These are called parameters.
- In the case of
evensum()
there is one parameter,x
. - In the body of the function, which is the expression that comes after
function(x)
we say what we want to do tox
(namely add up every other element of it, starting with the second element).
- In the case of
- When we call the function, that value that we “pass in for
x
” (which was the vector10:20
in our example) gets substituted for every occurrence ofx
in the body of the function. - When the function is exectuted it returns whatever value the expression that is its body returns.
- It takes arguments which are the names of the variables that act as placeholders for the values that you will pass into the function. These are called parameters.
Compound expressions
- Most functions are going to be more complex than just a single statement like
sum(x[seq(2, length(x), by = 2)])
, so how do we execute muliple statements in the body of a function and return the value that we want? - Answer: Use a compound expression which is merely a bunch of expressions wrapped up in curly braces.
- Here is an example of how we could have written
evensum
with multiple expressions and intermediate value assignments:
# define it
evensum <- function(x) {
idx <- seq(2, length(x), by = 2)
y <- x[idx]
sum(y)
}
# use it!
evensum(10:20)
#> [1] 75
# hooray! we got the same thing as last time
The value of a compound expression
It is important to understand that if you have a compound expression like:
then the value of this expression will be the value returned by the last expression inside the compound expression (in the above case, the value returned by{ statement_1 statement_2 statement_3 }
statement_3
)- Thus, when the body of a function is a compound expression, the value that the function returns will just be the value of the last expression in the body of the function.
You can also be explicit about it and wrap it in
return()
:
# define it using return
evensum <- function(x) {
idx <- seq(2, length(x), by = 2)
y <- x[idx]
return(sum(y))
}
# use it
evensum(10:20)
#> [1] 75
Multiple parameters
- Having multiple parameters that your function understands is straightforward. You just put them in the argument list of
function()
! - Imagine that we wanted to make a more general function of which
evensum
was a special case. Perhaps we want to be able to easily take any vector, start on the Start-th element and then sum it and every other element in steps of Step. Here goes:
skip_sum <- function(x, Start, Step) sum(x[seq(Start, length(x), by = Step)])
- Now, we can call that with values for
x
,Start
andStep
.
# Try this:
skip_sum(x = 10:20, Start = 4, Step = 3)
#> [1] 48
# this will give us the same results as evensum:
skip_sum(x = 10:20, Start = 2, Step = 2)
#> [1] 75
- Notice that in the two invocations of
skip_sum
above, we are using named parameters. i.e., we are sayingx = 10:20
andStart = 2
, etc. We are being very explicit about which arguments belong to which parameters.
In class exercise
OK, everyone, you have 5 minutes to write your own function called addmult
that takes two vectors, a
and b
and returns a named list in which the first component named “Add” is equal to a+b and the second component named “Mult” is equal to a*b.
Positional parameters and named parameters
- R is rather interesting in that you don’t have to give it named parameters. It can figure out what you mean as long as the order of arguments you give it is in the order of the parameters in the function definition:
# this:
skip_sum(10:20, 4, 3)
#> [1] 48
# will be the same as this:
skip_sum(x = 10:20, Start = 4, Step = 3)
#> [1] 48
- But, if the argument list is long, it is often easier to read (and less error-prone) to use named parameters.
- Note that named arguments don’t have to be in any particular order, if they are named!
# these will all be the same
skip_sum(x = 10:20, Start = 4, Step = 3)
#> [1] 48
skip_sum(Start = 4, Step = 3, x = 10:20)
#> [1] 48
skip_sum(Start = 4, x = 10:20, Step = 3)
#> [1] 48
Mixing named and positional parameters
This is far out. You can mix named and positional parameters. R’s rule is this:
- First match all the named parameters to named arguments and then move them off the argument list and the parameter list.
- Then match the remaining arguments to the remaining parameters positionally.
# for example these will all be the same
skip_sum(x = 10:20, Start = 4, Step = 3)
#> [1] 48
skip_sum(10:20, Start = 4, Step = 3)
#> [1] 48
skip_sum(Start = 4, Step = 3, 10:20)
#> [1] 48
skip_sum(Start=4, 10:20, 3)
#> [1] 48
Defining default parameter values
- What if we realized that most the time we are using
skip_sum
we just want it set withStart = 2
andStep = 2
so that it behaves likeevensum
. - This is a job for defaults!
- You can set default values for parameters by using an
=
sign in the function definition. If no arguments are passed in for those parameters, then the defaults are applied.
# new definition with defaults
skip_sum <- function(x, Start = 2, Step = 2) sum(x[seq(Start, length(x), by = Step)])
# now see how it behaves:
skip_sum(10:20)
#> [1] 75
skip_sum(10:20, 2, 2)
#> [1] 75
skip_sum(10:20, Step = 2)
#> [1] 75
skip_sum(10:20, 4, 5) # this overrides the defaults
#> [1] 31
The Curious “…”" parameter
- Sometimes, it would be nice to be able to pass other named parameters to a function that is called from inside your own function. This is what the
...
parameter is for. - You have probably seen
...
in the help files for certain functions. It just means “any other named parameters that make sense”. - Of course, they only make sense if your function takes whatever else was passed in and does something appropriate with them.
- Example: our
skip_sum
function won’t do so well with NAs:
vec <- 10:20
vec[c(3,5,8,9)] <- NA
# here is what vec looks like
vec
#> [1] 10 11 NA 13 NA 15 16 NA NA 19 20
# try it:
skip_sum(vec)
#> [1] NA
# Wait, I want it to treat NAs as zeroes
- If you want it to treat NAs as zeroes you can redefine
skip_sum
like this:
skip_sum <- function(x, Start = 2, Step = 2, ...) sum(x[seq(Start, length(x), by = Step)], ...)
- Note the “…” in the argument list and in the body. It is being passed in as an argument to the
sum
function. - Try it:
skip_sum(vec, na.rm = TRUE) # we pass in a named argument that does not match Start, or Step.
#> [1] 58
# that gave us the same as:
skip_sum({vec[is.na(vec)]<-0; vec}) # note use of compound expression...
#> [1] 58
# If we don't pass in na.rm = TRUE then it doesn't get passed to sum:
skip_sum(vec)
#> [1] 58
Listing a function
- If you type a function name, without the parentheses, R will list the code that went into the function definition
- Try it:
# list evensum
evensum
#> function(x) {
#> idx <- seq(2, length(x), by = 2)
#> y <- x[idx]
#> return(sum(y))
#> }
# list skip_sum
skip_sum
#> function(x, Start = 2, Step = 2, ...) sum(x[seq(Start, length(x), by = Step)], ...)
# have a look at read.csv, which is just read.table with some defaults:
read.csv
#> function (file, header = TRUE, sep = ",", quote = "\"", dec = ".",
#> fill = TRUE, comment.char = "", ...)
#> read.table(file = file, header = header, sep = sep, quote = quote,
#> dec = dec, fill = fill, comment.char = comment.char, ...)
#> <bytecode: 0x7fda9c595d58>
#> <environment: namespace:utils>
A Brief Journey into S3 function overloading
What’s this UseMethod stuff?
- Sometimes, when you list a function, like
print
for example, you get something like this:
print
#> function (x, ...)
#> UseMethod("print")
#> <bytecode: 0x7fda9e3a2d48>
#> <environment: namespace:base>
- Hey! I don’t see any code listing there! What is going on!
- The
UseMethod
call in there actually is the function inside theprint
function. UseMethod
is the (somewhat klugie and limited) way that R overloads its functions.- What is overloading? It means that if you pass something to the
print
function, it will behave differently depending on what the class of the first argument is. - Aha! This is why data frames print out differently than lists, etc.
How this works
- The print function has been defined so that when it is called it looks to see what class the first argument (corresponding to the named parameter
x
). let’s say that argument is of classweird
. In that case, the print function looks for another function calledprint.weird()
. If it finds it, it will execute that function on the argument.
How many class-specific print functions are there?
You can list them with methods()
methods(print)
#> [1] print.acf*
#> [2] print.AES*
#> [3] print.anova*
#> [4] print.aov*
#> [5] print.aovlist*
#> [6] print.ar*
#> [7] print.Arima*
#> [8] print.arima0*
#> [9] print.AsIs
#> [10] print.aspell*
#> [11] print.aspell_inspect_context*
#> [12] print.bibentry*
#> [13] print.Bibtex*
#> [14] print.browseVignettes*
#> [15] print.by
#> [16] print.changedFiles*
#> [17] print.check_code_usage_in_package*
#> [18] print.check_compiled_code*
#> [19] print.check_demo_index*
#> [20] print.check_depdef*
#> [21] print.check_dotInternal*
#> [22] print.check_make_vars*
#> [23] print.check_nonAPI_calls*
#> [24] print.check_package_code_assign_to_globalenv*
#> [25] print.check_package_code_attach*
#> [26] print.check_package_code_data_into_globalenv*
#> [27] print.check_package_code_startup_functions*
#> [28] print.check_package_code_syntax*
#> [29] print.check_package_code_unload_functions*
#> [30] print.check_package_compact_datasets*
#> [31] print.check_package_CRAN_incoming*
#> [32] print.check_package_datasets*
#> [33] print.check_package_depends*
#> [34] print.check_package_description*
#> [35] print.check_package_description_encoding*
#> [36] print.check_package_license*
#> [37] print.check_packages_used*
#> [38] print.check_po_files*
#> [39] print.check_Rd_contents*
#> [40] print.check_Rd_line_widths*
#> [41] print.check_Rd_metadata*
#> [42] print.check_Rd_xrefs*
#> [43] print.check_so_symbols*
#> [44] print.check_T_and_F*
#> [45] print.check_vignette_index*
#> [46] print.checkDocFiles*
#> [47] print.checkDocStyle*
#> [48] print.checkFF*
#> [49] print.checkRd*
#> [50] print.checkReplaceFuns*
#> [51] print.checkS3methods*
#> [52] print.checkTnF*
#> [53] print.checkVignettes*
#> [54] print.citation*
#> [55] print.codoc*
#> [56] print.codocClasses*
#> [57] print.codocData*
#> [58] print.colorConverter*
#> [59] print.compactPDF*
#> [60] print.condition
#> [61] print.connection
#> [62] print.data.frame
#> [63] print.Date
#> [64] print.default
#> [65] print.dendrogram*
#> [66] print.density*
#> [67] print.difftime
#> [68] print.dist*
#> [69] print.DLLInfo
#> [70] print.DLLInfoList
#> [71] print.DLLRegisteredRoutines
#> [72] print.dummy_coef*
#> [73] print.dummy_coef_list*
#> [74] print.ecdf*
#> [75] print.factanal*
#> [76] print.factor
#> [77] print.family*
#> [78] print.fileSnapshot*
#> [79] print.findLineNumResult*
#> [80] print.formula*
#> [81] print.ftable*
#> [82] print.function
#> [83] print.getAnywhere*
#> [84] print.glm*
#> [85] print.hclust*
#> [86] print.help_files_with_topic*
#> [87] print.hexmode
#> [88] print.HoltWinters*
#> [89] print.hsearch*
#> [90] print.htest*
#> [91] print.html*
#> [92] print.infl*
#> [93] print.integrate*
#> [94] print.isoreg*
#> [95] print.kmeans*
#> [96] print.knitr_kable*
#> [97] print.Latex*
#> [98] print.LaTeX*
#> [99] print.libraryIQR
#> [100] print.listof
#> [101] print.lm*
#> [102] print.loadings*
#> [103] print.loess*
#> [104] print.logLik*
#> [105] print.ls_str*
#> [106] print.medpolish*
#> [107] print.MethodsFunction*
#> [108] print.mtable*
#> [109] print.NativeRoutineList
#> [110] print.news_db*
#> [111] print.nls*
#> [112] print.noquote
#> [113] print.numeric_version
#> [114] print.object_size*
#> [115] print.octmode
#> [116] print.packageDescription*
#> [117] print.packageInfo
#> [118] print.packageIQR*
#> [119] print.packageStatus*
#> [120] print.pairwise.htest*
#> [121] print.PDF_Array*
#> [122] print.PDF_Dictionary*
#> [123] print.pdf_doc*
#> [124] print.pdf_fonts*
#> [125] print.PDF_Indirect_Reference*
#> [126] print.pdf_info*
#> [127] print.PDF_Keyword*
#> [128] print.PDF_Name*
#> [129] print.PDF_Stream*
#> [130] print.PDF_String*
#> [131] print.person*
#> [132] print.POSIXct
#> [133] print.POSIXlt
#> [134] print.power.htest*
#> [135] print.ppr*
#> [136] print.prcomp*
#> [137] print.princomp*
#> [138] print.proc_time
#> [139] print.raster*
#> [140] print.Rd*
#> [141] print.recordedplot*
#> [142] print.restart
#> [143] print.RGBcolorConverter*
#> [144] print.rle
#> [145] print.roman*
#> [146] print.sessionInfo*
#> [147] print.shiny.tag*
#> [148] print.shiny.tag.list*
#> [149] print.simple.list
#> [150] print.smooth.spline*
#> [151] print.socket*
#> [152] print.srcfile
#> [153] print.srcref
#> [154] print.stepfun*
#> [155] print.stl*
#> [156] print.StructTS*
#> [157] print.subdir_tests*
#> [158] print.summarize_CRAN_check_status*
#> [159] print.summary.aov*
#> [160] print.summary.aovlist*
#> [161] print.summary.ecdf*
#> [162] print.summary.glm*
#> [163] print.summary.lm*
#> [164] print.summary.loess*
#> [165] print.summary.manova*
#> [166] print.summary.nls*
#> [167] print.summary.packageStatus*
#> [168] print.summary.ppr*
#> [169] print.summary.prcomp*
#> [170] print.summary.princomp*
#> [171] print.summary.table
#> [172] print.summaryDefault
#> [173] print.table
#> [174] print.tables_aov*
#> [175] print.terms*
#> [176] print.ts*
#> [177] print.tskernel*
#> [178] print.TukeyHSD*
#> [179] print.tukeyline*
#> [180] print.tukeysmooth*
#> [181] print.undoc*
#> [182] print.vignette*
#> [183] print.warnings
#> [184] print.xgettext*
#> [185] print.xngettext*
#> [186] print.xtabs*
#>
#> Non-visible functions are asterisked
OK! That is a bunch. But notice that there is not a print.weird
function.
Make a new print function
I am going to make a print function that will be invoked on objects of class weird:
print.weird <- function(x) {
print("Object of class weird")
print(paste("Length:", length(x)))
print(head(x))
}
Our print.weird function doesn’t do much, it just shows the length and the first few lines, and lets the user know the object is of class weird.
Let’s watch it in action:
boing <- 1:100
print(boing) # does what we expect
#> [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
#> [18] 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34
#> [35] 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51
#> [52] 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68
#> [69] 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85
#> [86] 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100
# make boing an object of class weird
class(boing) <- "weird"
print(boing)
#> [1] "Object of class weird"
#> [1] "Length: 100"
#> [1] 1 2 3 4 5 6
It is good keep in mind that R is full of overloaded functions — that is functions that behave differently depending on the class of their arguments.
Applying functions to list elements
One of the great things about understanding how to define your own functions is that it lets you harness the power of the lapply()
function which takes two main arguments:
- a list (really any vector…)
- a function
And it cycles over the list and applies the function to each of the list’s components, and returns the results in a list!
Here is its usage from its help file: lapply(X, FUN, ...)
An example: baby rockfish
To motivate our discussion of lapply()
I have a simple example. In the directory data/rockfish_larvae
there are 7 files, each with the genotypes of 96 larval rockfish that are the offspring of a single female. Here is what one file looks like:
cat(readLines("data/rockfish_larvae/K17larvae.txt")[1:10], sep = "\n")
#> Loc1_a Loc1_b Loc2_a Loc2_b Loc3_a Loc3_b Loc4_a Loc4_b Loc5_a Loc5_b Loc6_a Loc6_b Loc7_a Loc7_b
#> 227 267 127 127 155 187 128 133 184 184 85 87 275 278
#> 231 267 123 127 159 169 128 133 184 184 85 85 275 278
#> 217 223 127 131 159 169 123 133 184 188 0 0 275 278
#> 217 219 127 127 187 187 128 133 184 184 85 87 275 275
#> 217 227 127 131 187 187 128 133 184 186 85 85 275 278
#> 231 267 123 127 187 193 123 123 184 184 85 85 275 278
#> 217 219 127 127 155 187 128 133 184 186 85 89 275 275
#> 217 223 123 127 187 193 123 133 184 184 85 85 275 275
#> 0 0 127 127 187 187 128 133 184 184 0 0 275 278
Each pair of columns are the genotypes at a single location (a locus) in the genome. The numbers refer to different alleles. 0’s denote missing data.
One quick and dirty way of detecting whether a rockfish mother has mated with more than one male is to see if any loci have more than 4 alleles amongst the female’s offspring.
So, our goal is to cycle over the 7 files, read them in, cycle over the loci, and for each locus, count the number of each allele observed, and ultimately count up the number of alleles.
Here goes…
Step 1: Get a named vector of file names
- We can use the
dir()
function to get the paths to the files we want. - We will throw some regex foo in there to name the elements of the vector the way we want:
library(stringr)
fpaths <- dir("data/rockfish_larvae", full.names = TRUE)
# they look like this
fpaths
#> [1] "data/rockfish_larvae/K17larvae.txt"
#> [2] "data/rockfish_larvae/K18larvae.txt"
#> [3] "data/rockfish_larvae/K20larvae.txt"
#> [4] "data/rockfish_larvae/K22larvae.txt"
#> [5] "data/rockfish_larvae/K23larvae.txt"
#> [6] "data/rockfish_larvae/K24larvae.txt"
#> [7] "data/rockfish_larvae/K26larvae.txt"
# grab the "nice" part to be their names
names(fpaths) <- str_match(fpaths, "data.*/(K[12][0-9]larvae)")[,2]
# here is what we now have:
fpaths
#> K17larvae K18larvae
#> "data/rockfish_larvae/K17larvae.txt" "data/rockfish_larvae/K18larvae.txt"
#> K20larvae K22larvae
#> "data/rockfish_larvae/K20larvae.txt" "data/rockfish_larvae/K22larvae.txt"
#> K23larvae K24larvae
#> "data/rockfish_larvae/K23larvae.txt" "data/rockfish_larvae/K24larvae.txt"
#> K26larvae
#> "data/rockfish_larvae/K26larvae.txt"
Step 2: Read the files into a list of 7 data frames
Here we can use lapply()
.
Here is one way to do it:
# note we are lapplying over a character vector. But the result
# still comes back as a list!
dframes <- lapply(fpaths, function(x) read.table(x, header = TRUE, na.strings = "0"))
And this is another way we could do it, using the … to pass the extra named parameters to read.table
dframes <- lapply(fpaths, read.table, header = TRUE, na.strings = "0")
Two important points to make:
function(x) read.table(x, header = TRUE, na.strings = "0")
returns a function which is used aslapply()
’sFUN
argument. That expression is as good as passing in the name of a function. Pretty cool…- You might see this sort of construction where a function is defined but not returned into a variable called an anonymous function.
- Defining a function and being explicit about passing the argument in is more flexible than passing the name of a function and extra named parameters.
Step 3: Make sure we have what we want
Note that dframes
is a list of length 7, and it has names that are appropriate:
length(dframes)
#> [1] 7
names(dframes)
#> [1] "K17larvae" "K18larvae" "K20larvae" "K22larvae" "K23larvae" "K24larvae"
#> [7] "K26larvae"
This shows that lapply()
propagates names to the list that it returns.
It would be nice to make sure that every component of it was a data frame of the correct size.
Class exercise: Use lapply to quickly compute the dimensions of every data frame that was just read in.
I want to pause for a moment and reiterate that each component of the list dframes
is a data frame! Remember that a list can store any object of any class or structure. It is easy to forget that….But when you remember it, you can have fun throwing all manner of objects into lists if need be.
Step 4: Prepare to count alleles of different types
- These data frames are not particularly tidy. It is particularly annoying that data for each locus are in two separate columns.
- There are lots of ways we could deal with this. One would be to reshape the data into one big tidy data frame with five colums (Mother, Larva, Locus, GeneCopy (a or b), Allele).
- But since we are working with lapply, we will do it differently.
- We are just counting up the alleles, so we can just stack the first and second columns for each locus on top of each other. Then all the alleles are in a single vector. Here we will do this.
- We can experiment with a single component first. It is convenient to call it
x
.
x <- dframes[[1]]
first_cols <- x[, c(T,F)] # pick out the first columns of each locus
second_cols <- x[, c(F,T)] # second columns of each locus
# now, name the colums so they are the same, and just refer to locus
# this involves a stringr function
names(first_cols) <- str_replace(names(first_cols), "_a", "")
names(second_cols) <- str_replace(names(second_cols), "_b", "")
# now, stack those up:
tmp <- rbind(first_cols, second_cols)
# see how big it is and what it looks like
dim(tmp)
#> [1] 192 7
head(tmp)
#> Loc1 Loc2 Loc3 Loc4 Loc5 Loc6 Loc7
#> 1 227 127 155 128 184 85 275
#> 2 231 123 159 128 184 85 275
#> 3 217 127 159 123 184 NA 275
#> 4 217 127 187 128 184 85 275
#> 5 217 127 187 128 184 85 275
#> 6 231 123 187 123 184 85 275
- OK! That works, but it was only for a single component that we had named
x
. We want to apply the same series of operations to every single component ofdframes
.- This is a job for
lapply
and a function!
- This is a job for
- Here we define a function and apply it:
# define a function of x (see how useful it was to call that thing x when we were experimenting?)
stack_em <- function(x) {
first_cols <- x[, c(T,F)]
second_cols <- x[, c(F,T)]
names(first_cols) <- str_replace(names(first_cols), "_a", "")
names(second_cols) <- str_replace(names(second_cols), "_b", "")
rbind(first_cols, second_cols)
}
# now, lapply that function over dframes:
dframes_stacked <- lapply(dframes, stack_em)
# note that we also could have done:
dframes_stacked <- lapply(dframes, function(x) {
first_cols <- x[, c(T,F)]
second_cols <- x[, c(F,T)]
names(first_cols) <- str_replace(names(first_cols), "_a", "")
names(second_cols) <- str_replace(names(second_cols), "_b", "")
rbind(first_cols, second_cols)
}
)
Step 5: Count up the occurrences of each allele in each column of dframes_stacked
- Recall that a data frame is just a special kind of a list. The columns of the data frame are the components of the list. So you can lapply over them.
- We will apply the table function to each column of each component of dframes_stacked.
- Far out! That requires nested lapply()’s:
alle_counts <- lapply(dframes_stacked, function(x) lapply(x, table))
# see what the first component of the result looks like:
alle_counts[1]
#> $K17larvae
#> $K17larvae$Loc1
#>
#> 217 219 223 227 231 267
#> 50 24 20 33 17 44
#>
#> $K17larvae$Loc2
#>
#> 123 127 131
#> 19 123 48
#>
#> $K17larvae$Loc3
#>
#> 155 159 169 187 193
#> 28 20 48 76 16
#>
#> $K17larvae$Loc4
#>
#> 123 128 133
#> 63 75 50
#>
#> $K17larvae$Loc5
#>
#> 184 186 188
#> 144 28 20
#>
#> $K17larvae$Loc6
#>
#> 85 87 89
#> 117 44 23
#>
#> $K17larvae$Loc7
#>
#> 275 278
#> 138 50
Step 6: Count up the total number of alleles at each locus
- The above result shows clear evidence of having more than four alleles, at least among some loci.
- But it is hard to look at. Can we summarize it further? Yes!
- We can take the length of each component to see how many distinct alleles there were:
list_result <- lapply(alle_counts, function(x) lapply(x, length))
# what does it look like?
list_result[1]
#> $K17larvae
#> $K17larvae$Loc1
#> [1] 6
#>
#> $K17larvae$Loc2
#> [1] 3
#>
#> $K17larvae$Loc3
#> [1] 5
#>
#> $K17larvae$Loc4
#> [1] 3
#>
#> $K17larvae$Loc5
#> [1] 3
#>
#> $K17larvae$Loc6
#> [1] 3
#>
#> $K17larvae$Loc7
#> [1] 2
- OK, that is nice, but it is hard to look at as a list.
Step 6 do-over: using sapply
- Because each locus yields just a single number, and there are exactly 7 loci per mother, we could simplify all these results into a table that is easier to look at.
- A nice function for this is
sapply()
which does this:- Run
lapply
- Pass the result through the
simplify2array
function
- Run
simplify2array()
looks at a list, and if it could be simply expressed as an array (like a matrix) without losing any information, then it does that, whilst preserving the names in a sensible fashion.- Check this out:
array_result <- sapply(alle_counts, function(x) sapply(x, length))
array_result
#> K17larvae K18larvae K20larvae K22larvae K23larvae K24larvae K26larvae
#> Loc1 6 4 4 4 5 4 5
#> Loc2 3 5 4 4 3 4 3
#> Loc3 5 5 4 3 3 5 4
#> Loc4 3 3 3 3 3 3 3
#> Loc5 3 3 3 3 2 3 2
#> Loc6 3 2 2 2 3 2 3
#> Loc7 2 2 2 2 2 2 2
Wrap up
- we’ve just scratched the surface of a whole family of
apply
-like functions that are present in R. - If you want to get more into them, I recommend Hadley Wickham’s
plyr
package. - If you are careful about keeping your data in a tidy format, then you can probably just use Hadley’s
dplyr
package which is phenomenally cool. - What you’ve learned here about functions will be useful all over the R world.
comments powered by Disqus