CKMR: A general overview
Paul B. Conn
The Wildlife Society CKMR Workshop, Sunday November 6, 2022
Paul Conn– Research statistician with the Marine Mammal Laboratory at NOAA Alaska Fisheries Science Center.
Eric Anderson– Research geneticist at NOAA’s Southwest Fisheries Science Center.
Other acknowledgments: Mark Bravington (CSIRO); Brian Taras, Lori Quakenbush (ADF&G)
8:00 - 8:45 Close-kin mark-recapture: An overview (P. Conn)
8:45 - 9:30 An introduction to genetic data and inheritance (E. Anderson)
9:30 - 9:45 Break
9:45 - 10:30 Statistical inference for CKMR abundance estimation (P. Conn)
10:30 - 11:15 Kin finding (E. Anderson)
11:15 - 12:00 Designing a CKMR study
12:00 - 1:00 Lunch
1:00 - 5:00 R/TMB labs (full day participants only)1
\[\\[1in]\]
1 You should have followed “Setting up your computer” instructions in the workshop book!
The basic idea of how CKMR works
Types of CKMR models and their assumptions
Strengths and limitations of CKMR for wildlife monitoring and management
Basic ideas on how to design a CKMR study
For full day participants, some ideas of how to code things up
We don’t expect anyone to be a CKMR expert after taking this workshop. There are a lot of levels of expertise required for successful CKMR implentations (including ecology, genetics, and statistics) - there are only a few people on earth that are an expert in all of these!!!
Slides for morning lectures: Lecture 1 (intro) https://eriqande.github.io/tws-ckmr-2022/slides/paul-talk-1.Rmd.htm
“Book” for afternoon labs: https://eriqande.github.io/tws-ckmr-2022/
General workshop github repository: https://github.com/eriqande/tws-ckmr-2022
A CKMR website w/ more examples: https://closekin.github.io/
Sample occasion 1: mark \(n\) animals (blue) out of a population of \(N\) animals
Sample occasion 2: capture \(M\) animals, \(m\) of which were previously marked
\(\color{blue}{\text{Mark-recapture}}\)
\(\color{blue}{\text{CKMR}}\)
A framework for estimating adult abundance and survival using the frequency of observed kinship relationships
Parent-offspring pairs (POPs) Adult abundance and reproductive schedules (assuming age is known…)
Half-sibling pairs (HSPs) Adult abundance and survival (again assuming ages are known)
Compare each genotyped sample to all of the others. We can then maximize the pseudo-likelihood
\(\prod_i \prod_{j>i} p_{ij}^{y_{ij}} (1-p_{ij})^{1-y_{ij}}\)
\(y_{ij}\) is a binary random variable taking on the value 1.0 if animals \(i\) and \(j\) are a match.
\(p_{ij}\) is the probability of a match
Compare each genotyped sample to all of the others. We can then maximize the pseudo-likelihood
\(\prod_i \prod_{j>i} p_{ij} y_{ij} + (1-p_{ij}) (1-y_{ij})\)
\(y_{ij}\) is a binary random variable taking on the value 1.0 if animals \(i\) and \(j\) are a match.
\(p_{ij}\) is the probability of a match
But how do we figure out what the \(p_{ij}\) probabilities are? And how are these related to what we care about (abundance and survival)?
-Depends on what type of relationship is being considered, sex of parent, etc.
-Calculations rely on ERRO
Lexis diagrams are helpful!
\[\begin{equation*} p_{ij} = \begin{cases} 0, & \text{if}\ a_i(b_j) < a_{mat} \text{ or } d_i < b_j \\ 1/N_{b_j}^F, & \text{otherwise} \end{cases} \end{equation*}\]
In words: the probability of a mother-offspring pair is zero if the potential mother was reproductively immature or dead at the time of \(j\)’s birth. If the potential mother was reproductively mature, it is simply 1 over the number of reproductively mature females.
\[\begin{equation*} p_{ij} = \begin{cases} 0, & \text{if}\ a_i(b_j) < a_{mat} \\ 1/N_{b_j}^F, & \text{otherwise} \end{cases} \end{equation*}\]
In words: the probability of a mother-offspring pair is zero if the potential mother was reproductively immature at the time of \(j\)’s birth. If the potential mother was reproductively mature, it is simply 1 over the number of reproductively mature females.
Accurate genotyping (no false positives!)
Population and sampling models are accurate
Kinship comparisons are “independent” (or close enough…)
No undiagnosed relationship between kinship probabilities and sampling probabilities that can’t be explained by observed (or inferred) covariates
Age
Spatial location
Status (Mating hierarchy)
We need enough genetic markers to tell apart various kin groups. For parent-offspring pairs we might only need 200 SNPs or so, but for half-siblings it is nice to have 3-4K (after pruning ill-behaved loci).
\(\color{red}{\rightarrow \text{High quality tissue samples}}\)
For species where reproductive maturity is not instantaneous, we need to model pre-adult population dynamics, so we need some idea of early survival and reproductive schedules (decent early life history information!). We also need to get the underlying Leslie matrix right (pre vs. postbreeding census, etc.)
Accurate sampling models have more to do with independent fates. E.g. we won’t want to model mothers and dependent offspring harvested in the same year, half-siblings harvested together, etc.
The quality of the pseudo-likelihood as an approximation decreases as the amount of relatedness in a population increases. The usual effect when this happens in statistics is that precision (e.g., confidence intervals) is overstated.
CKMR has been conducted on populations as low as \(\approx 600\) but we don’t want to go super low.
No undiagnosed relationship between kinship probabilities and sampling probabilities that can’t be explained by observed (or inferred) covariates
Populations that are “not too big and not too small” (e.g. several hundred to ten million or so) Need \(\approx 50\) kin pairs to produce reasonable estimates, required # of samples increases with \(\sqrt{N}\)
Decent genetic variation (severe inbreeding may make it difficult to discriminate different kin pair types)
Good “mixing” (either through movement or through sampling)
Group living species
One mother and one father! No weird breeding systems (e.g., armadillos)
Ages are extremely helpful (teeth? epigenetics? sampling young?)
Some will require case-specific developments (philopatry, spatial structure, pair bonding)
Skill level probably depends on what type of data (e.g., POP-only, POP+HSP, single cohort vs. multiple cohort)
Relatively low cost, especially after markers and aging methods are developed (epigenetics?)
You’re going to want to have a biologist, biometrician, and a geneticist involved. Very few people have all skills and it’s a lot to ask of a single person (especially a grad student!!!)
Many models will need to be population- and data-dependent and will require bespoken code. That said, there are examples and templates out there that will help.
CKMR “looks backwards” - inference is made based on ERRO at the time of offspring’s births
Precision tends to be best “back in time” - precision in present day not usually as good (especially for long-lived species; see beluga example here)
Implications for monitoring/management
There are sometimes ways to help improve precision in the present by designing a CKMR experiment correctly!
Paper in prep (Taras, Conn, Quakenbush, Bravington, Baylis). Annual sampling of bearded seal subsistence harvests (tissue samples + teeth) by ADF&G
Our paper isn’t published yet so I can’t make all data public yet. But we can still use life history information, approximate sample sizes, and kin finding data to help motivate ideas and conduct some modeling exercises.
Size of Beringia DPS thought to be 400-500K seals!
We had something like 2000 tissue samples over \(\approx 20\) years.
Can we use this to get an idea of overall abundance? What special things will we need to consider when fitting models to these data?