Reproducible Research for Conservation — Mérida, Mexico
2022-01-12
Preface
This is the website/book associated with the UCMexus Conservation Genomics Workshop 2022, held January 10th and 11th in Mérida, Mexico.
0.1 Setting up your computer
This course covers topics in landscape genetics/genomics, and relies heavily on the R programming language. In order to follow along with the code and be successful in running all of the examples, it is imperative to have very recent versions of R and RStudio, and updated versions of a number of packages.
The following is a description of the software needed to engage in the course. This setup was tested on a Mac running Mojave 10.14.6, but should also work on most other Mac or Windows operating systems.
If you are running Linux, there will be some external dependencies to install (such as the GEOS library), that are actually wrapped up in the binary versions of packages ‘terra’ and ‘sf’ on CRAN for Mac and Windows.
0.1.1 Step 1. Get the latest versions of R and RStudio
- First, install the latest version of R. Go to
https://cran.r-project.org/ and follow the appropriate
link to Download and Install, depending on your operating system (Linux, MacOS, or Windows).
- For Mac, you can download
R-4.1.2.pkg
and install. - For Windows, first go into the
base
directory and Download R 4.1.2, and install it. THEN, go back to “R for Window” page where you clicked intobase
, and download and install theRtools
as well. The latter gives you tools for building packages, which is required for a few packages we use.
- For Mac, you can download
- Download and install the latest stable version of RStudio for your operating system. Go to https://www.rstudio.com/products/rstudio/download/ and choose the big blue download button for “RStudio Desktop, Open Source License, Free,” then hit the download button on the next page and follow instructions to install RStudio.
0.1.2 Step 2. Install a number of R packages that are relatively easy to install
Our work will require a number of packages that can be found in binary form on CRAN. As such, installing them is typically not to arduous.
Sometimes, when installing packages, you may get a message telling you that a later version of the package you want is available in source form than in binary form. Typically, it is still easiest, fastest, and usually reliable, to just use the binary form. So, after R gives you such a message, if you see R asking a question like:
Do you want to install from sources the package which needs compilation? (Yes/no/cancel)
Usually the appropriate answer is to type no
into the console, and hit return.
I have found that sometimes, when requesting the installation of a large number
of packages, there can be the occasional problem. Sometimes, an error message
tells you which package (often a dependency) failed to install. If that is
the case, try to install the failed package by itself with the install.packages()
,
function, directly, and then re-run the install.packages()
command that
originally failed.
You can install packages in a few rounds of different types. Note that I tend to use the RStudio CRAN mirror.
0.1.2.4 The ‘gradientForest’ package
Now, we come to a somewhat harder package to install, because it requires some compilation. We use these packages for the gradient random forest analysis on the last day. ‘gradientForest’ requires the dependancy ‘extendedForest,’ an R package for classification and regression based on forest trees using random inputs.
Both of these packages require compilation, it seems. With the newest version of R, ‘extendedForest’ is installed automatically as a dependency.
To compile packages might require a little more configuration, etc.
If you are using a Mac
You should be running R 4.0.0 or above. In that case, to build packages, you need the XCode Command line tools. If you don’t already have these, then you should install them. Doing so requires administrator access on your computer. Open the “Terminal” app and type:
xcode-select --install
into the command line. When asked for your password, provide it. Then click to agree on the software license agreement, and finally click the blue “install” button that comes up on the other screen.
Then, you also need to get the gFortran compiler for Mac. Information about this
can be found by going to https://mac.r-project.org/tools/.
Find the link (inside the yellow boxes) for your appropriate type of mac (intel or ARM64), and
download the appropriate installer for the gFortran compiler. Then install it. In my case, since
I have an Intel Mac, I downloaded, gfortran-8.2-Mojave.dmg
. To install it, you have to
double click it and then go inside the folder that creates, to find the gfortran.pkg
, which
you can double click to launch the installer. Install it to the default location.
On my Mac running Mojave (10.14.6) I got errors when trying to compile this package. It was unable to find the C ‘stdlib.h’ header file. So, I ended up also having to do this:
sudo installer -pkg /Library/Developer/CommandLineTools/Packages/macOS_SDK_headers_for_macOS_10.14.pkg -target /
If you have compilation problems on your Mac you might try something similar. But you might have to name the headers for your own system. perhaps by tab completing on:
sudo installer -pkg /Library/Developer/CommandLineTools/Packages/macOS_SDK_headers_for_macOS_10
If you have a Windows computer
We haven’t tested it, but you should be just fine so long as you
installed the Rtools
as described above.
Finally, if you have set up your system on Mac or Windows, install the package
The package we need is not on CRAN. Rather, it is on the R-forge repository.
In this case, we might be asked:
Package which is only available in source form, and may need compilation of
C/C++/Fortran: ‘extendedForest’
Do you want to attempt to install these from sources? (Yes/no/cancel)
And, for this one, we need to respond, Yes
install.packages("gradientForest", repos="https://R-Forge.R-project.org")
If all goes well, this should compile for you.
0.1.3 Step 3. Make sure you have git and an account on GitHub
In a two-day workshop, we don’t have time to go deeply, if much at all, into
the many uses of the version control software, git
, and the cloud-based code management
system GitHub, that is built upon git. But, if you are interested in version control for
your analyses, and you are interested in using GitHub to share and present the results
of your research, then you really will want to become proficient with both git
and
GitHub.
Fortunately, there is an outstanding, free book on the web that goes into great detail about how to use git and GitHub with R and RStudio. It is available at https://happygitwithr.com/, and it is well worth a read, and particularly following the steps in:
- Chapter 4. Register a GitHub account.
- Chapter 6. Install Git (Note: Mac users , if
xcode-select --install
ran successfully, then git will have been installed). - Chapter 7. Introduce yourself to git
If you want to use GitHub, you will also have to establish an SSH public/private key pair to authenticate your computer to GitHub. That is described in:
- Chapter 10: Set up keys for SSH