paul-talk-1.rmd

CKMR: A general overview

Paul B. Conn

The Wildlife Society CKMR Workshop, Sunday November 6, 2022

Outline

Introductions
Preliminaries
Capture-recapture vs CKMR
History
CKMR Workflow
Expected relative reproductive output and pseudo-likelihood
CKMR Assumptions
Random (or not so random) thoughts
A CKMR case study: Bearded seals

Introductions

Paul Conn– Research statistician with the Marine Mammal Laboratory at NOAA Alaska Fisheries Science Center.

Eric Anderson– Research geneticist at NOAA’s Southwest Fisheries Science Center.

Other acknowledgments: Mark Bravington (CSIRO); Brian Taras, Lori Quakenbush (ADF&G)

Preliminaries: Schedule

8:00 - 8:45 Close-kin mark-recapture: An overview (P. Conn)
8:45 - 9:30 An introduction to genetic data and inheritance (E. Anderson)
9:30 - 9:45 Break
9:45 - 10:30 Statistical inference for CKMR abundance estimation (P. Conn)
10:30 - 11:15 Kin finding (E. Anderson)
11:15 - 12:00 Designing a CKMR study
12:00 - 1:00 Lunch
1:00 - 5:00 R/TMB labs (full day participants only)¹

\[\\[1in]\]

¹ You should have followed “Setting up your computer” instructions in the workshop book!

Preliminaries: What we want you to get out of this workshop

The basic idea of how CKMR works
Types of CKMR models and their assumptions
Strengths and limitations of CKMR for wildlife monitoring and management
Basic ideas on how to design a CKMR study
For full day participants, some ideas of how to code things up

We don’t expect anyone to be a CKMR expert after taking this workshop. There are a lot of levels of expertise required for successful CKMR implentations (including ecology, genetics, and statistics) - there are only a few people on earth that are an expert in all of these!!!

Preliminaries: Resources

Slides for morning lectures: Lecture 1 (intro) https://eriqande.github.io/tws-ckmr-2022/slides/paul-talk-1.Rmd.htm

“Book” for afternoon labs: https://eriqande.github.io/tws-ckmr-2022/

General workshop github repository: https://github.com/eriqande/tws-ckmr-2022

A CKMR website w/ more examples: https://closekin.github.io/

Mark-recapture vs. CKMR

sampling on \(>1\) occasion
need \(p>0.2\) for decent estimation
estimate abundance, survival, etc.
intensive sampling!

Mark-recapture vs. CKMR

Simple MR: Lincoln-Petersen estimator

Sample occasion 1: mark \(n\) animals (blue) out of a population of \(N\) animals

Sample occasion 2: capture \(M\) animals, \(m\) of which were previously marked

Goal: estimate population size, \(N\)
Intuition: \(m/M = n/N\)
estimator: \(\hat{N} = nM/m\)

Mark-recapture vs. CKMR

CKMR

offspring “mark” two parents
sampling on \(\ge 1\) occasion
observed kin pair frequncies used to estimate adult survival and abundance
No need to release animals live
potentially easier data to come by (harvests)

Mark-recapture vs. CKMR

CKMR

Example: sample \(n_j = 4\) juveniles, \(M = 6\) adults (dark colored)
Want to make inference about the number of adults
Each juvenile has exactly two parents (\(n=8\))
Compare genetics of sampled juveniles to sampled adults for parental relationships
\(m=3\) parents found
\(\hat{N} = nM/m = 8*6/3 = 16\)
Amazing!

Mark-recapture vs. CKMR

CKMR

Example: sample \(n_j = 4\) juveniles, \(M = 6\) adults (dark colored)
Want to make inference about the number of adults
Each juvenile has exactly two parents (\(n=8\))
Compare genetics of sampled juveniles to sampled adults for parental relationships
\(m=3\) parents found
\(\hat{N} = nM/m = 8*6/3 = 16\)
Amazing!

Mark-recapture vs. CKMR

Beyond Lincoln-Petersen

\(\color{blue}{\text{Mark-recapture}}\)

Large explosion in mark-recapture literature
Extensions allowing multiple occasions
Survival estimation (CJS)
Spatial capture-recapture (including multistate)
Flexible software (e.g., Mark)

\(\color{blue}{\text{CKMR}}\)

Relatively new
Extensions for multiple years (monitoring programs)
Use of half-siblings to estimate adult survival
Few spatial applications
Kin-finding software but no specific software for estimation (must tailor to study system)

\(\color{red}{\rightarrow \text{Likelihood}}\)

History

Initial development by Hans Skaug (influenced by Tore Schweder) in the late 90s
Some convergent evolution (Rawding et al., some other cetacean work)
In the 2010s technology improvement and further statistical development (Bravington et al. 2016) really paved the way
Web of Science search (Oct 2022)

Figure 1: Web of Science publications on CKMR or close-kin mark-recapture.

CKMR in a nutshell

A framework for estimating adult abundance and survival using the frequency of observed kinship relationships

Parent-offspring pairs (POPs) Adult abundance and reproductive schedules (assuming age is known…)

Half-sibling pairs (HSPs) Adult abundance and survival (again assuming ages are known)

CKMR Workflow

Pseudo-likelihood

Compare each genotyped sample to all of the others. We can then maximize the pseudo-likelihood

\(\prod_i \prod_{j>i} p_{ij}^{y_{ij}} (1-p_{ij})^{1-y_{ij}}\)

\(y_{ij}\) is a binary random variable taking on the value 1.0 if animals \(i\) and \(j\) are a match.

\(p_{ij}\) is the probability of a match

Pseudo-likelihood

Compare each genotyped sample to all of the others. We can then maximize the pseudo-likelihood

\(\prod_i \prod_{j>i} p_{ij} y_{ij} + (1-p_{ij}) (1-y_{ij})\)

\(y_{ij}\) is a binary random variable taking on the value 1.0 if animals \(i\) and \(j\) are a match.

\(p_{ij}\) is the probability of a match

\(\color{red}{\text{In reality, random variables are not independent!!}}\)

\(\color{red}{\text{So the pseudo-likelihood is an approximation}}\)

Expected relative reproductive output

But how do we figure out what the \(p_{ij}\) probabilities are? And how are these related to what we care about (abundance and survival)?

-Depends on what type of relationship is being considered, sex of parent, etc.

-Calculations rely on ERRO

Lexis diagrams are helpful!

Expected relative reproductive output

Simple example: mother-offspring pairs, knife-edged sexual maturity, no heterogeneity in reproductive success, \(b_i < b_j\)

\[\begin{equation*} p_{ij} = \begin{cases} 0, & \text{if}\ a_i(b_j) < a_{mat} \text{ or } d_i < b_j \\ 1/N_{b_j}^F, & \text{otherwise} \end{cases} \end{equation*}\]

In words: the probability of a mother-offspring pair is zero if the potential mother was reproductively immature or dead at the time of \(j\)’s birth. If the potential mother was reproductively mature, it is simply 1 over the number of reproductively mature females.

Expected relative reproductive output

Simple example: mother-offspring pairs, knife-edged sexual maturity, no heterogeneity in reproductive success, \(b_i < b_j\)

\[\begin{equation*} p_{ij} = \begin{cases} 0, & \text{if}\ a_i(b_j) < a_{mat} \\ 1/N_{b_j}^F, & \text{otherwise} \end{cases} \end{equation*}\]

In words: the probability of a mother-offspring pair is zero if the potential mother was reproductively immature at the time of \(j\)’s birth. If the potential mother was reproductively mature, it is simply 1 over the number of reproductively mature females.

\(\color{red}{\text{Ages are important!}}\)

CKMR Assumptions

Accurate genotyping (no false positives!)
Population and sampling models are accurate
Kinship comparisons are “independent” (or close enough…)
No undiagnosed relationship between kinship probabilities and sampling probabilities that can’t be explained by observed (or inferred) covariates
- Age
- Spatial location
- Status (Mating hierarchy)

CKMR Assumptions - Implications

Accurate genotyping (no false positives!)

We need enough genetic markers to tell apart various kin groups. For parent-offspring pairs we might only need 200 SNPs or so, but for half-siblings it is nice to have 3-4K (after pruning ill-behaved loci).

\(\color{red}{\rightarrow \text{High quality tissue samples}}\)

CKMR Assumptions - Implications

Population model and sampling models are accurate

For species where reproductive maturity is not instantaneous, we need to model pre-adult population dynamics, so we need some idea of early survival and reproductive schedules (decent early life history information!). We also need to get the underlying Leslie matrix right (pre vs. postbreeding census, etc.)
Accurate sampling models have more to do with independent fates. E.g. we won’t want to model mothers and dependent offspring harvested in the same year, half-siblings harvested together, etc.

Do sampling events over-represent the frequency of kin pairs in the population?

CKMR estimates likely to be biased!!! One strategy is to omit certain categories of comparison (e.g., only making cross-cohort HSP comparisons, do not make mother-offspring comparisons for females and young harvested at the same time)

CKMR Assumptions - Implications

Kinship comparisons are “independent” (or close enough…)

The quality of the pseudo-likelihood as an approximation decreases as the amount of relatedness in a population increases. The usual effect when this happens in statistics is that precision (e.g., confidence intervals) is overstated.

CKMR has been conducted on populations as low as \(\approx 600\) but we don’t want to go super low.

CKMR Assumptions - Implications

No undiagnosed relationship between kinship probabilities and sampling probabilities that can’t be explained by observed (or inferred) covariates
- Age
- Spatial location
- Status (mating hierarchy)

Is there a relationship between kinship and sampling probabilities ?

CKMR estimates likely to be biased!!! If covariates are available to explain this relationship, they should be modeled to fix the problem. In some cases, e.g., highly fecund individuals having a greater propensity to be harvested, we might need to adjust our estimation strategy (e.g., leave out father-offspring comparisons for deer) or model them differently somehow.

So what populations is CKMR good for?

Populations that are “not too big and not too small” (e.g. several hundred to ten million or so) Need \(\approx 50\) kin pairs to produce reasonable estimates, required # of samples increases with \(\sqrt{N}\)
Decent genetic variation (severe inbreeding may make it difficult to discriminate different kin pair types)
Good “mixing” (either through movement or through sampling)
~~Group living species~~
One mother and one father! No weird breeding systems (e.g., armadillos)
Ages are extremely helpful (teeth? epigenetics? sampling young?)
Some will require case-specific developments (philopatry, spatial structure, pair bonding)

So what populations is CKMR good for?

Completed or underway as of 2022 (c/o M. Bravington)

How easy is it to conduct CKMR experiments?

Skill level probably depends on what type of data (e.g., POP-only, POP+HSP, single cohort vs. multiple cohort)
Relatively low cost, especially after markers and aging methods are developed (epigenetics?)
You’re going to want to have a biologist, biometrician, and a geneticist involved. Very few people have all skills and it’s a lot to ask of a single person (especially a grad student!!!)
Many models will need to be population- and data-dependent and will require bespoken code. That said, there are examples and templates out there that will help.

CKMR: historical ecology

CKMR “looks backwards” - inference is made based on ERRO at the time of offspring’s births

CKMR: historical ecology

CKMR “looks backwards” - inference is made based on ERRO at the time of offspring’s births
Precision tends to be best “back in time” - precision in present day not usually as good (especially for long-lived species; see beluga example here)
Implications for monitoring/management
There are sometimes ways to help improve precision in the present by designing a CKMR experiment correctly!

A CKMR case study: bearded seals

Paper in prep (Taras, Conn, Quakenbush, Bravington, Baylis). Annual sampling of bearded seal subsistence harvests (tissue samples + teeth) by ADF&G

A CKMR case study: Bearded seals

Our paper isn’t published yet so I can’t make all data public yet. But we can still use life history information, approximate sample sizes, and kin finding data to help motivate ideas and conduct some modeling exercises.

Size of Beringia DPS thought to be 400-500K seals!
We had something like 2000 tissue samples over \(\approx 20\) years.
Can we use this to get an idea of overall abundance? What special things will we need to consider when fitting models to these data?

Outline

Introductions

Preliminaries: Schedule

Preliminaries: What we want you to get out of this workshop

Preliminaries: Resources

Mark-recapture vs. CKMR

Mark-recapture vs. CKMR

Simple MR: Lincoln-Petersen estimator

Mark-recapture vs. CKMR

CKMR

Mark-recapture vs. CKMR

CKMR

Mark-recapture vs. CKMR

CKMR

Mark-recapture vs. CKMR

Beyond Lincoln-Petersen

\(\color{red}{\rightarrow \text{Likelihood}}\)

History

CKMR in a nutshell

CKMR Workflow

CKMR Workflow

Pseudo-likelihood

Pseudo-likelihood

\(\color{red}{\text{In reality, random variables are not independent!!}}\)

\(\color{red}{\text{So the pseudo-likelihood is an *approximation*}}\)

Expected relative reproductive output

Expected relative reproductive output

Simple example: mother-offspring pairs, knife-edged sexual maturity, no heterogeneity in reproductive success, \(b_i < b_j\)

Expected relative reproductive output

Simple example: mother-offspring pairs, knife-edged sexual maturity, no heterogeneity in reproductive success, \(b_i < b_j\)

\(\color{red}{\text{Ages are important!}}\)

CKMR Assumptions

CKMR Assumptions - Implications

CKMR Assumptions - Implications

CKMR Assumptions - Implications

CKMR Assumptions - Implications

So what populations is CKMR good for?

So what populations is CKMR good for?

Completed or underway as of 2022 (c/o M. Bravington)

How easy is it to conduct CKMR experiments?

CKMR: historical ecology

CKMR: historical ecology

CKMR: historical ecology

A CKMR case study: bearded seals

A CKMR case study: Bearded seals

\(\color{red}{\text{So the pseudo-likelihood is an approximation}}\)