To work through this, clone the repository (an RStudio project) at https://github.com/eriqande/make-a-BGP-map
to get all the necessary input files, etc. Then open up the RStudio
project and run though Make-a-BGP-map-Notebook.Rmd
.
This chronicles the steps taken whilst making a genoscape for willow flycatchers. The order in which we discuss building up this map is different from the order in which we put layers down to actually make the map. Here, we will start with making the genoscape, which is actually the part that sits atop the whole map. But we do that because that is the part that will actually change from species to species, and we want everyone to be relatively fresh for understanding those mutable parts. Once the genoscape is dealt with, we will then talk about the next layer of country and state boundaries, and also coastline polygons. We do this, because it doesn’t take too long to plot these features, and using them is a good way to figure out the desired extent of your map and to decide upon a projection. Finally, we will talk about the bottom layer of the map, which is the raster with earthforms and shading from Natural Earth Data.
This work draws on a few functions that I have in packages that I have up on GitHub, namely:
tess3r
. Note that you can’t
use the default version of tess3r
, you have to use my fork
of it, which has some extra functionality.genoscapeRtools
Get those packages like this:
remotes::install_github("eriqande/TESS3_encho_sen") # for special version of tess3r
remotes::install_github("eriqande/genoscapeRtools") # for Eric's genoscapeRtools
The rest of the packages you need can be downloaded from CRAN. If you
don’t have them you should get them: raster
,
sf
, fields
, downloader
, and
tidyverse
. The last one there gets ggplot2 and a number of
other packages by Hadley Wickham.
You can get those like this:
install.packages(c("raster", "sf", "tidyverse", "fields", "downloader"))
library(raster) # important to load before tidyverse, otherwise it masks select()
library(tidyverse)
library(sf)
library(ggspatial)
The genoscape can be thought of as the bright colors smeared across space that show where different genetically identifiable groups of birds reside on the breeding grounds. These genoscapes are stored as rasters, and transparency is used to indicate how much confidence one has in the genetic identification of individuals in different areas. These rasters are made by interpolating Q-values from a program like STRUCTURE or ADMIXTURE between individuals that were sampled in space.
We need a matrix of Q-values for individuals. We have one that we will read in as a tibble
Q_tibble <- read_table("inputs/breeding-bird-Q-values.txt")
Q_tibble
Column id
is the sample name and the rest are ancestry
fractions to different clusters (named with three characters) estimated
by STRUCTURE.
We also need to know where those individuals were sampled in space. We have that here:
LatLong_tibble <- read_tsv("inputs/breeding-bird-lat-longs.tsv")
LatLong_tibble
The Q-values correspond to different clusters, as shown above. We must specify the colors that we wish to assign to each of those clusters. We do that with a named vector like this:
cluster_colors <- c(
INW = "#984ea3",
EST = "#377eb8",
PNW = "#4daf4a",
SSW = "#ff7f00",
SCC = "#ffff33",
KER = "#e41a1c",
WMT = "#00ffff"
)
For fun, we can plot these to see what they look like:
enframe(cluster_colors) %>%
mutate(x = 1:n(),
y = 1:n()) %>%
ggplot(aes(x = x, y = y, fill = name)) +
geom_point(pch = 21, size = 12) +
scale_fill_manual(values = cluster_colors)
Finally, we need to have a GIS Shapefile that tells us the range of
the breeding birds, so that genoscape can be clipped properly. We read
this shapefile with the st_read()
function from package
sf
.
breeding_range <- st_read("inputs/wifl-shapefiles-revised/WIFLrev.shp")
For grins, we can plot those polygons to see what they look like:
ggplot(breeding_range) +
geom_sf() +
theme_bw()
Within Eric’s fork of tess3r
is a function called
tess3Q_map_rasters
. It takes input from the objects we have
above, but it takes that input as matrices rather than data frames, etc.
so there is a little finagling to be done.
We need to ensure that we have values for birds in the right order (a
job for aleft_join
), and we also have to make it a matrix
with Longitude in the first column and Lat in the second.
long_lat_tibble <- Q_tibble %>%
select(id) %>%
left_join(LatLong_tibble, by = "id") %>%
select(Long, Lat)
long_lat_matrix <- long_lat_tibble %>%
as.matrix()
Pull off the names of individuals and make a matrix of it:
Q_matrix <- Q_tibble %>%
select(-id) %>%
as.matrix()
For this, we use the above variables in
tess3r::tess3Q_map_rasters()
. Note the use of namespace
addressing for this function rather than loading the whole
tess3r
package with the library()
command.
genoscape_brick <- tess3r::tess3Q_map_rasters(
x = Q_matrix,
coord = long_lat_matrix,
map.polygon = breeding_range,
window = extent(breeding_range)[1:4],
resolution = c(300,300), # if you want more cells in your raster, set higher
# this next lines need to to be here, but don't do much...
col.palette = tess3r::CreatePalette(cluster_colors, length(cluster_colors)),
method = "map.max",
interpol = tess3r::FieldsKrigModel(10),
main = "Ancestry coefficients",
xlab = "Longitude",
ylab = "Latitude",
cex = .4
)
# after that, we need to add names of the clusters back onto this raster brick
names(genoscape_brick) <- names(Q_tibble)[-1]
That gives us a raster brick of Q-values associated with each cell in the raster, but those values are not always constrained between 0 and 1, so we have to massage them a little bit in the next section.
For this we use the function
genoscapeRtools::qprob_rando_raster()
. This takes the
raster brick that comes out of tess3Q_map_rasters() and does some
rescaling and (maybe) some random sampling to return a raster of colors
that I hope will do a reliable job of representing (in some way)
predicted assignment accuracy over space. See
?genoscapeRtools::qprob_rando_raster
to learn about the
scaling options, etc. (However, I am not convinced that all of those
options are reliably estimated.)
This will squash the raster brick down to a single RGBA (i.e., four channels, red, green, blue and alpha) raster brick.
genoscape_rgba <- genoscapeRtools::qprob_rando_raster(
TRB = genoscape_brick,
cols = cluster_colors,
alpha_scale = 2.0,
abs_thresh = 0.0,
alpha_exp = 1.55,
alpha_chop_max = 230
)
# at this point, we must be explicit about adding a projection to the raster.
# This adds the info for a regular lat-long projection
crs(genoscape_rgba) <- "+proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0"
We can easily plot this with the function layer_spatial
from the ggspatial
package:
ggplot() +
ggspatial::layer_spatial(genoscape_rgba) +
theme_bw() +
coord_sf()
Note that if we wanted to add the actual sampling points to this
(that are given in long_lat_tibble
), we can use ggspatial’s
geom_spatial_point()
function.
ggplot() +
layer_spatial(genoscape_rgba) +
geom_spatial_point(data = long_lat_tibble, mapping = aes(x = Long, y = Lat)) +
theme_bw() +
coord_sf()
It would probably be better to jigger those points a little bit. That could be done by mutating each of them a random bit away.
Since we will be using the Natural Earth Data Set for the raster background of our map, we will also use the Natural Earth lines and polygons. The Natural Earth data set is an amazing, open-source resource for making beautiful maps. Check it out at naturalearthdata.com.
For coastlines, countries, and states/provinces, you will need to
download three different shape files. We will be working with the
highest resolution Natural Earth data sets which are the
10m
versions. Download and unzip them to a directory within
this project called ne_shapefiles
. You can do that with R
like this:
dir.create("ne_shapefiles", showWarnings = FALSE)
tmpfile <- tempfile()
downloader::download(
url = "https://www.naturalearthdata.com/http//www.naturalearthdata.com/download/10m/physical/ne_10m_coastline.zip",
dest = tmpfile
)
unzip(zipfile = tmpfile, exdir = "ne_shapefiles")
We do the same for country boundaries, and then state/province boundaries, too. Note, in both cases we just want to get the boundary lines that are internal to the coast. Otherwise you can end up getting a thick line around the coast that is not so great…
# country boundaries
tmpfile <- tempfile()
downloader::download(
url = "https://www.naturalearthdata.com/http//www.naturalearthdata.com/download/10m/cultural/ne_10m_admin_0_boundary_lines_land.zip",
dest = tmpfile
)
unzip(zipfile = tmpfile, exdir = "ne_shapefiles")
# state and province boundaries
# country boundaries
tmpfile <- tempfile()
downloader::download(
url = "https://www.naturalearthdata.com/http//www.naturalearthdata.com/download/10m/cultural/ne_10m_admin_1_states_provinces_lines.zip",
dest = tmpfile
)
unzip(zipfile = tmpfile, exdir = "ne_shapefiles")
First read it in:
coastlines <- st_read("ne_shapefiles/ne_10m_coastline.shp")
Reading layer `ne_10m_coastline' from data source `/Users/eriq/Documents/git-repos/make-a-BGP-map/ne_shapefiles/ne_10m_coastline.shp' using driver `ESRI Shapefile'
Simple feature collection with 4133 features and 3 fields
Geometry type: LINESTRING
Dimension: XY
Bounding box: xmin: -180 ymin: -85.22194 xmax: 180 ymax: 83.6341
Geodetic CRS: WGS 84
Now, let’s just crop out the part that we want. This is somewhat key: right here we will define the extent in lat-long of the region that we want to plot:
# note! it is important to put the elements of domain in this
# order, because the function raster::extent() is expecting thing
# to be in this order, and doesn't parse the names of the vectors
# the way sf::st_crop() does.
domain <- c(
xmin = -135,
xmax = -60,
ymin = 22,
ymax = 60
)
coast_cropped <- st_crop(coastlines, domain)
Warning: attribute variables are assumed to be spatially constant throughout all geometries
Then plot the cropped part:
ggplot(coast_cropped) +
geom_sf() +
coord_sf()
OK, that was pretty reasonable. Now crop all the lines to
domain
and plot them, and put the genoscape on top of that,
too.
countries_cropped <- st_read("ne_shapefiles/ne_10m_admin_0_boundary_lines_land.shp") %>%
st_crop(domain)
Reading layer `ne_10m_admin_0_boundary_lines_land' from data source
`/Users/eriq/Documents/git-repos/make-a-BGP-map/ne_shapefiles/ne_10m_admin_0_boundary_lines_land.shp' using driver `ESRI Shapefile'
Simple feature collection with 462 features and 18 fields
Geometry type: MULTILINESTRING
Dimension: XY
Bounding box: xmin: -141.0055 ymin: -55.12092 xmax: 140.9776 ymax: 70.07531
Geodetic CRS: WGS 84
Warning: attribute variables are assumed to be spatially constant throughout all geometries
states_cropped <- st_read("ne_shapefiles/ne_10m_admin_1_states_provinces_lines.shp") %>%
st_crop(domain)
Reading layer `ne_10m_admin_1_states_provinces_lines' from data source
`/Users/eriq/Documents/git-repos/make-a-BGP-map/ne_shapefiles/ne_10m_admin_1_states_provinces_lines.shp' using driver `ESRI Shapefile'
Simple feature collection with 10114 features and 21 fields
Geometry type: MULTILINESTRING
Dimension: XY
Bounding box: xmin: -178.1371 ymin: -49.25087 xmax: 178.4486 ymax: 81.12853
Geodetic CRS: WGS 84
Warning: attribute variables are assumed to be spatially constant throughout all geometries
Now, plot it all. Notice we do it by just adding layers.
mapg <- ggplot() +
geom_sf(data = coast_cropped) +
geom_sf(data = countries_cropped, fill = NA) +
geom_sf(data = states_cropped, fill = NA) +
ggspatial::layer_spatial(genoscape_rgba) +
theme_bw()
# now plot it under default lat-long projection
mapg +
coord_sf()
That is looking like it should, but the projection is pretty ugly. In the next section we will consider projection.
The new version of ggplot2
makes it super easy to
project everything on the fly by passing the projection to
coord_sf()
. For things in North America, it seems that a
Lambert conic projection does a nice job of keeping British Columbia and
Alaska from looking way too big. We can define such a projection with a
string, like this:
lamproj <- "+proj=lcc +lat_1=20 +lat_2=60 +lat_0=40 +lon_0=-100 +x_0=0 +y_0=0 +ellps=GRS80 +datum=NAD83 +units=m +no_defs"
Now, let’s see how our developing map looks under such a projection.
We just define lamproj
as a coordinate reference system and
pass it in to coord_sf()
:
mapg +
coord_sf(crs = st_crs(lamproj))
I would call that decidedly better.
Now, all that remains is to put all of this on top of a nicely tinted map that shows landforms and things. Note that once we have such a layer under the rest of this stuff, we will probably omit the coastline layer, since it gets a little dark where the coastline is highly dissected.
I am going to recommed that you get the hypsometrically tinted
Natural Earth map with water bodies and rivers on it, and you might as
well get the one that has some ocean basin coloring too. We will put it
into a directory in the project called ne_rasters
. Note,
this expands to about half a gig of data.
dir.create("ne_rasters", showWarnings = FALSE)
tmpfile <- tempfile()
downloader::download(
url = "https://www.naturalearthdata.com/http//www.naturalearthdata.com/download/10m/raster/HYP_HR_SR_OB_DR.zip",
dest = tmpfile
)
unzip(zipfile = tmpfile, exdir = "ne_rasters")
Reading a large raster in with the raster package does not take up
much memory, because it leaves it on disk. So we will open a connection
to the big, massive raster using the brick
function from
the raster
package and then crop it:
hypso <- brick("ne_rasters/HYP_HR_SR_OB_DR/HYP_HR_SR_OB_DR.tif")
hypso_cropped <- crop(hypso, extent(domain))
Now, we just add hypso_cropped
in with
ggspatial::spatial_layer()
below all of our other layers
(dropping the coastlines), and we might make country and state lines
thinner (and different sizes):
big_one <- ggplot() +
ggspatial::layer_spatial(hypso_cropped) +
geom_sf(data = countries_cropped, fill = NA, size = 0.15) +
geom_sf(data = states_cropped, fill = NA, size = 0.1) +
ggspatial::layer_spatial(genoscape_rgba) +
theme_bw() +
coord_sf(crs = st_crs(lamproj))
big_one
Seeing that sort of small on the screen does not really do justice to
it. You can ggsave
it in whatever size is desired. For
whatever the final product is going to be, you will want to mess with
the line thickness on those country and state borders…
While that looks nice, most people might prefer to have the
background part there in a rectangle, rather than a conic surface. We
can do that by setting the xlim and ylim to coord_sf()
.
But, note that we have to do that in terms of the units that the Lambert
projection is referring to, not just in simple lat-long coordinates
(i.e. we have to project those too!)
So, imagine that we want to chop out a rectangle from Haida Gwai, off the coast of British Columbia, straight down and straight across to the right (on the image). And we would want the bottom and right sides determined by a point in the bottom right that has the y-value of the tip of Florida, and an x-value which is essentially at St. John, New Brunswick. Then, we need to determine the x and y coordinates of those points under our Lambert conformal conic projection. We do that by making a simple features tibble of the lat-longs for those points and then projecting it.
# Here are lat longs for those points
pts <- tribble(
~place, ~long, ~lat,
"HaidaGwai", -132.3143526, 53.7298050,
"FloridaTip", -80.8032237, 25.0909780,
"St.Johns", -65.943453, 45.291222
)
# here we turn that into an sf object
pts_sf <- st_as_sf(pts, coords = c("long", "lat"),
crs = 4326) # 4326 is shorthand for "+proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0"
# and now we transform it to lambert
pts_lambert <- st_transform(pts_sf, crs = lamproj)
# when we print it in a notebook, we can see that the geometry is a list
# column, but can't see the values:
pts_lambert
Simple feature collection with 3 features and 1 field
Geometry type: POINT
Dimension: XY
Bounding box: xmin: -2010114 ymin: -1366911 xmax: 2452877 ymax: 1821869
Projected CRS: +proj=lcc +lat_1=20 +lat_2=60 +lat_0=40 +lon_0=-100 +x_0=0 +y_0=0 +ellps=GRS80 +datum=NAD83 +units=m +no_defs
So, for you to be able to see the values we can just print it like this:
# just note the values:
st_as_text(pts_lambert$geometry)
[1] "POINT (-2010114 1821869)" "POINT (1871262 -1366911)" "POINT (2452877 1037658)"
That means we want xlim = c(-2010114, 2452877) and ylim = c(-1366911, 1821869). Let’s try that:
rectangled <- ggplot() +
ggspatial::layer_spatial(hypso_cropped) +
geom_sf(data = countries_cropped, fill = NA, size = 0.15) +
geom_sf(data = states_cropped, fill = NA, size = 0.1) +
ggspatial::layer_spatial(genoscape_rgba) +
theme_bw() +
coord_sf(
crs = st_crs(lamproj),
xlim = c(-2010114, 2452877),
ylim = c(-1366911, 1821869),
expand = FALSE) # expand = FALSE means put the axes right at the xlim and ylim
rectangled
Okay, that does it. In reality, you would probably want to expand the
original domain
to be a bigger chunk of the earth, so that
when you cut stuff off there was more left…
If you want to add more elements to the figure you can do so by giving them a lat-long and making a simple features tibble out of it and then putting them there using geom_sf(). It all fits nicely into the ggplot framework.
sessioninfo::session_info()
─ Session info ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
setting value
version R version 4.3.3 (2024-02-29)
os macOS Monterey 12.7.5
system aarch64, darwin20
ui RStudio
language (EN)
collate en_US.UTF-8
ctype en_US.UTF-8
tz America/Denver
date 2024-11-16
rstudio 2024.09.0+375 Cranberry Hibiscus (desktop)
pandoc 3.1.12.1 @ /Users/eriq/mambaforge-arm64/bin/pandoc
─ Packages ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
package * version date (UTC) lib source
abind 1.4-5 2016-07-21 [1] CRAN (R 4.3.0)
bit 4.0.5 2022-11-15 [1] CRAN (R 4.3.0)
bit64 4.0.5 2020-08-30 [1] CRAN (R 4.3.0)
class 7.3-22 2023-05-03 [2] CRAN (R 4.3.3)
classInt 0.4-9 2023-02-28 [1] CRAN (R 4.3.0)
cli 3.6.1 2023-03-23 [1] CRAN (R 4.3.0)
codetools 0.2-19 2023-02-01 [2] CRAN (R 4.3.3)
colorspace 2.1-0 2023-01-23 [1] CRAN (R 4.3.0)
crayon 1.5.2 2022-09-29 [1] CRAN (R 4.3.0)
DBI 1.2.0 2023-12-21 [1] CRAN (R 4.3.1)
digest 0.6.33 2023-07-07 [1] CRAN (R 4.3.0)
dotCall64 1.2 2024-10-04 [1] CRAN (R 4.3.3)
downloader 0.4 2015-07-09 [1] CRAN (R 4.3.0)
dplyr * 1.1.4 2023-11-17 [1] CRAN (R 4.3.1)
e1071 1.7-13 2023-02-01 [1] CRAN (R 4.3.0)
fansi 1.0.4 2023-01-22 [1] CRAN (R 4.3.0)
farver 2.1.1 2022-07-06 [1] CRAN (R 4.3.0)
fields 16.3 2024-09-30 [1] CRAN (R 4.3.3)
forcats * 1.0.0 2023-01-29 [1] CRAN (R 4.3.0)
generics 0.1.3 2022-07-05 [1] CRAN (R 4.3.0)
genoscapeRtools 0.1.0 2024-11-15 [1] Github (eriqande/genoscapeRtools@a15006b)
ggplot2 * 3.4.4 2023-10-12 [1] CRAN (R 4.3.1)
ggspatial * 1.1.9 2023-08-17 [1] CRAN (R 4.3.0)
glue 1.6.2 2022-02-24 [1] CRAN (R 4.3.0)
gtable 0.3.3 2023-03-21 [1] CRAN (R 4.3.0)
hms 1.1.3 2023-03-21 [1] CRAN (R 4.3.0)
KernSmooth 2.23-22 2023-07-10 [2] CRAN (R 4.3.3)
knitr 1.43 2023-05-25 [1] CRAN (R 4.3.0)
labeling 0.4.2 2020-10-20 [1] CRAN (R 4.3.0)
lattice 0.22-5 2023-10-24 [2] CRAN (R 4.3.3)
lifecycle 1.0.3 2022-10-07 [1] CRAN (R 4.3.0)
lubridate * 1.9.3 2023-09-27 [1] CRAN (R 4.3.1)
magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.3.0)
maps 3.4.2 2023-12-15 [1] CRAN (R 4.3.1)
Matrix 1.6-1 2023-08-14 [1] CRAN (R 4.3.0)
munsell 0.5.0 2018-06-12 [1] CRAN (R 4.3.0)
pillar 1.9.0 2023-03-22 [1] CRAN (R 4.3.0)
pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.3.0)
proxy 0.4-27 2022-06-09 [1] CRAN (R 4.3.0)
purrr * 1.0.2 2023-08-10 [1] CRAN (R 4.3.0)
R6 2.5.1 2021-08-19 [1] CRAN (R 4.3.0)
raster * 3.6-30 2024-10-02 [1] CRAN (R 4.3.3)
Rcpp 1.0.11 2023-07-06 [1] CRAN (R 4.3.0)
RcppEigen 0.3.3.9.3 2022-11-05 [1] CRAN (R 4.3.0)
readr * 2.1.4 2023-02-10 [1] CRAN (R 4.3.0)
rlang 1.1.2 2023-11-04 [1] CRAN (R 4.3.1)
rstudioapi 0.15.0 2023-07-07 [1] CRAN (R 4.3.0)
s2 1.1.4 2023-05-17 [1] CRAN (R 4.3.0)
scales 1.2.1 2022-08-20 [1] CRAN (R 4.3.0)
sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.3.0)
sf * 1.0-14 2023-07-11 [1] CRAN (R 4.3.0)
sp * 2.1-2 2023-11-26 [1] CRAN (R 4.3.1)
spam 2.11-0 2024-10-03 [1] CRAN (R 4.3.3)
stringi 1.7.12 2023-01-11 [1] CRAN (R 4.3.0)
stringr * 1.5.1 2023-11-14 [1] CRAN (R 4.3.1)
terra 1.7-65 2023-12-15 [1] CRAN (R 4.3.1)
tess3r 1.1.0 2024-11-15 [1] Github (eriqande/TESS3_encho_sen@2a0ce6c)
tibble * 3.2.1 2023-03-20 [1] CRAN (R 4.3.0)
tidyr * 1.3.0 2023-01-24 [1] CRAN (R 4.3.0)
tidyselect 1.2.0 2022-10-10 [1] CRAN (R 4.3.0)
tidyverse * 2.0.0 2023-02-22 [1] CRAN (R 4.3.0)
timechange 0.2.0 2023-01-11 [1] CRAN (R 4.3.0)
tzdb 0.4.0 2023-05-12 [1] CRAN (R 4.3.0)
units 0.8-3 2023-08-10 [1] CRAN (R 4.3.0)
utf8 1.2.3 2023-01-31 [1] CRAN (R 4.3.0)
vctrs 0.6.5 2023-12-01 [1] CRAN (R 4.3.1)
viridisLite 0.4.2 2023-05-02 [1] CRAN (R 4.3.0)
vroom 1.6.3 2023-04-28 [1] CRAN (R 4.3.0)
withr 2.5.0 2022-03-03 [1] CRAN (R 4.3.0)
wk 0.8.0 2023-08-25 [1] CRAN (R 4.3.0)
xfun 0.39 2023-04-20 [1] CRAN (R 4.3.0)
[1] /Users/eriq/Library/R/4.3/library
[2] /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library
───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────