library(tidyverse)
library(lubridate)

Doing Analysis

Using commercial fishery GSI dataset to estimate the number of copies of the early run haplotype in fish caught in the ocean but originating from the Klamath Basin.

This provides an estimate of how abundant the early-run haplotype is within the Klamath Basin.

To do this analysis we need a couple things.

  1. maturation probabilites for different age Chinook
  2. age-structure of returns sampled at hatcheries in the Klamath Basin
  3. Allele frequencies of the fish captured in commercial fishery.

Values for 1 and 2 are taken from Hankin and Logan (2010) A PRELIMINARY ANALYSIS OF CHINOOK SALMON CODED-WIRE TAG RECOVERY DATA FROM IRON GATE, TRINITY RIVER AND COLE RIVERS HATCHERIES, BROOD YEARS 1978-2004.

Table 4.13 has the age-structure of escapement at Trinity River hatchery for fingerling and yearling releases.

Maturation probabilites are in Table 4.8.

t48_falldata <- tibble(age = rep(c(2, 3, 4), 4), ecotype = rep("fall", 12), pop = rep(rep(c("IGH", "TRH"), each = 3), 2), rel_type = rep(c("fingerling", "yearling"), each = 6), mat_value = c(0.0315, .4808, .9327, .1029, .6447, .9362, .0087, .2385, .9348, .0363, .5629, .9407))

t48_springdata <- tibble(age = rep(c(2, 3, 4), 2), ecotype = rep("spring", 6), pop = rep(c("TRH"), 6), rel_type = rep(c("fingerling", "yearling"), each = 3), mat_value = c(0.0409, .5577, .9365, .0199, .3526, .9031))

t148_data <- bind_rows(t48_falldata, t48_springdata)

t148_data %>%
  group_by(age, pop, ecotype) %>%
  summarise(mat_mean = mean(mat_value)) %>%
  arrange(pop, ecotype) %>%
  group_by(age) %>%
  summarise(mat_mean_overall = mean(mat_mean))

age-structure of TRH returns, data from table 4.13 in Hankin 2010

age_structure <- tibble(age = rep(c(3, 4, 5), 2), rel_type = rep(c("fingerling", "yearling"), each = 3), n_escape = c(7401, 2691, 93, 19133, 9465, 140)) %>%
  group_by(rel_type) %>%
  mutate(
    n_tot = sum(n_escape),
    pct_age = round(n_escape / n_tot, 2)
  ) %>%
  group_by(age) %>%
  summarise(mean_pct_age = mean(pct_age))

Definitions; [FW = freshwater, yo = years old]

P3 = proportion of 3 yo in FW P4 = proportion of 4 year olds in FW M3 = maturation rate for 3 yo M4 = maturation rate for 4 yo S3 = saltwater proportion of 3yo S4 = saltwater proportion of 4yo

S3M3 = unscaled proportion of 3 yo in FW S4M4 = unscaled proportion of 4 yo in FW

S3M3/(S3M3+S4M4) = P3 therefore S3M3 is proportional to P3 and S4M4 would be proportional to P4. Rearrange the proportionality and we get P3/M3 is proportional to S3, P4/M4 is proportional to S4

P3 <- 0.7
M3 <- 0.47
P4 <- 0.3
M4 <- 0.93

S3_unnorm <- P3 / M3
S4_unnorm <- P4 / M4

S3_norm <- round(S3_unnorm / sum(S3_unnorm, S4_unnorm), 3) # rounded to 3 digits to keep things clean
S4_norm <- round(S4_unnorm / sum(S3_unnorm, S4_unnorm), 3)

Now calculate the probability that either a 3 yo or 4 yo will enter freshwater this year.

p_migrate_3yo <- S3_norm * M3
p_migrate_4yo <- S4_norm * M4

p_migrate <- sum(p_migrate_3yo, p_migrate_4yo)

p_migrate = 0.55188 Expand the observed counts by the proportion of genotype EE/EL/LL that has escaped to FW.

Let: q = fraction of fish with genotype EE/EL/LL that have escaped to freshwater by month 0.45 = 1-p_migrate

That leads to Ngeno = ngeno/(1-0.45q)

Ngeno = the number of genotype XX fish that would have been encountered given no fish left saltwater.

rosa_data <- read_rds("./data/101-RoSA-klamath-estuary-samples.rds")

estuary_meta <- read_csv("./data/102-Klamath-entryDATA_withNMFSID.csv")
## Warning: Missing column names filled in: 'X16' [16]
estuary_data <- left_join(estuary_meta, rosa_data, by = c("NMFS ID" = "NMFS_DNA_ID")) %>%
  dplyr::select(-ID) %>%
  rename(NMFS_DNA_ID = "NMFS ID") %>%
  dplyr::select(NMFS_DNA_ID, rosa, everything())

q_df <- estuary_data %>%
  filter(!str_detect(rosa, "\\?")) %>%
  filter(year == 2010) %>% # using 2010 only because there were very few heterozygotes in 2009.
  mutate(
    month = str_sub(month, 1, 3),
    month = factor(month, levels = c("May", "Jun", "Jul", "Aug", "Sep", "Oct"))
  ) %>%
  count(rosa, month) %>%
  group_by(rosa) %>%
  mutate(
    n_geno = sum(n),
    q = cumsum(round(n / n_geno, 3)),
    denom = 1 - (q * .45),
    tmp = paste0(rosa, "_", month)
  )

q_df

Load ocean fishery data

ocean_data <- read_rds("./data/108-Klamath-EE-frequency-model-data-oceanfishery.rds")

2010 ocean analysis

ocean_2010 <- ocean_data %>%
  mutate(cMonth = month(COLLECTION_DATE, label = TRUE)) %>%
  filter(cYear == 2010) %>%
  count(rosa, cMonth) %>%
  mutate(tmp = paste0(rosa, "_", cMonth))

counts2expand <- q_df %>%
  select(tmp, denom) %>%
  left_join(ocean_2010, ., "tmp") %>%
  mutate(denom = ifelse(rosa.x == "LLLLLLLL" & cMonth %in% c("May", "Jun"), 1.0, denom)) %>%
  replace_na(list(denom = 0.55)) %>%
  select(-rosa.y) %>%
  ungroup() %>%
  rename(rosa = rosa.x)

counts2010e <- counts2expand %>%
  group_by(rosa) %>%
  mutate(
    Ngeno = n / lag(denom, 1),
    Ngeno = ifelse(is.na(Ngeno), n, Ngeno)
  ) %>%
  group_by(rosa) %>%
  mutate(sumNgeno = sum(Ngeno)) %>%
  ungroup() %>%
  mutate(sumNfish = sum(unique(sumNgeno)))

counts2010e

Ngeno is the expanded count, the math to calculate this is n/denom lagged 1

sumNgeno is the sum of all expanded RoSA genotype counts by genotype,

sumNfish is the sum of all expanded RoSA counts

For example; EL cMonth 6 n caught = 4, denom lagged 1 = 0.99145 –> 4 / 0.99145 = 4.034495

For EL cMonth 5 there is no expansion because we assume no fish have left saltwater. Same for EE, no expansion for cMonth 5.

Frequency of the E and L haplotypes after accounting for maturation probability and fish that escape to freshwater in 2010:

counts2010e %>%
  group_by(rosa) %>%
  slice(1) %>%
  ungroup() %>%
  summarise(
    E_freq = sum(sumNgeno * c(2, 1, 0)) / (sumNfish[1] * 2),
    L_freq = sum(sumNgeno * c(0, 1, 2)) / (sumNfish[1] * 2)
  )

2011 ocean analysis

Note limited sample size (n=72) and fishery duration for 2011 (samples only from july and august)

ocean_2011 <- ocean_data %>%
  mutate(cMonth = month(COLLECTION_DATE, label = TRUE)) %>%
  filter(cYear == 2011) %>%
  count(rosa, cMonth) %>%
  mutate(tmp = paste0(rosa, "_", cMonth))

counts2expand_2011 <- q_df %>%
  select(tmp, denom) %>%
  left_join(., ocean_2011, "tmp") %>%
  mutate(denom = ifelse(rosa.x == "LLLLLLLL" & cMonth %in% c("May", "Jun"), 1.0, denom)) %>%
  replace_na(list(denom = 0.55)) %>%
  select(-rosa.y) %>%
  ungroup() %>%
  rename(rosa = rosa.x)

counts2011e <- counts2expand_2011 %>%
  group_by(rosa) %>%
  mutate(
    Ngeno = n / lag(denom, 1),
    Ngeno = ifelse(is.na(Ngeno), n, Ngeno)
  ) %>%
  filter(!is.na(Ngeno)) %>%
  mutate(sumNgeno = sum(Ngeno)) %>%
  ungroup() %>%
  mutate(sumNfish = sum(unique(sumNgeno)))

counts2011e

Frequency of the E and L haplotypes after accounting for maturation probability and fish that escape to freshwater in 2011

counts2011e %>%
  group_by(rosa) %>%
  slice(1) %>%
  ungroup() %>%
  summarise(
    E_freq = sum(sumNgeno * c(2, 1, 0)) / (sumNfish[1] * 2),
    L_freq = sum(sumNgeno * c(0, 1, 2)) / (sumNfish[1] * 2)
  )

2012 ocean analysis

ocean_2012 <- ocean_data %>%
  mutate(cMonth = month(COLLECTION_DATE, label = TRUE)) %>%
  filter(cYear == 2012) %>%
  count(rosa, cMonth) %>%
  mutate(tmp = paste0(rosa, "_", cMonth))

counts2expand_2012 <- q_df %>%
  select(tmp, denom) %>%
  left_join(ocean_2012, ., "tmp") %>%
  mutate(denom = ifelse(rosa.x == "LLLLLLLL" & cMonth %in% c("May", "Jun"), 1.0, denom)) %>%
  replace_na(list(denom = 0.55)) %>%
  select(-rosa.y) %>%
  ungroup() %>%
  rename(rosa = rosa.x)

counts2012e <- counts2expand_2012 %>%
  group_by(rosa) %>%
  mutate(
    Ngeno = n / lag(denom, 1),
    Ngeno = ifelse(is.na(Ngeno), n, Ngeno)
  ) %>%
  group_by(rosa) %>%
  mutate(sumNgeno = sum(Ngeno)) %>%
  ungroup() %>%
  mutate(sumNfish = sum(unique(sumNgeno)))

counts2012e

Frequency of the E and L haplotype after accounting for maturation probability and fish that escape to freshwater in 2012

counts2012e %>%
  group_by(rosa) %>%
  slice(1) %>%
  ungroup() %>%
  summarise(
    E_freq = sum(sumNgeno * c(2, 1, 0)) / (sumNfish[1] * 2),
    L_freq = sum(sumNgeno * c(0, 1, 2)) / (sumNfish[1] * 2)
  )

Uncorrected estimates of RoSA haplotype frequencies in ocean

First get the oncorrected estimates of the frequencies of the different genotypes

uncorr_geno_freqs <- ocean_data %>%
  count(cYear, rosa) %>%
  group_by(cYear) %>%
  mutate(freq = n / sum(n))

uncorr_geno_freqs

Then add up the allele freqs from the genotypes. \(\times 1\) for each EE and \(\times 1/2\) for each heterozygote:

alle_freqs <- uncorr_geno_freqs %>%
  mutate(
    E_alle_freq = case_when(
      rosa == "EEEEEEEE" ~ freq,
      rosa == "HHHHHHHH" ~ freq * 0.5,
      TRUE ~ 0.0
    ),
    L_alle_freq = case_when(
      rosa == "LLLLLLLL" ~ freq,
      rosa == "HHHHHHHH" ~ freq * 0.5,
      TRUE ~ 0.0
    )
  )
alle_freqs

And now summarise that by year and format it for easy reading.

alle_freqs %>%
  group_by(cYear) %>%
  summarise(
    E_haplo_freq = sum(E_alle_freq),
    L_haplo_freq = sum(L_alle_freq)
  )

Session Info

sessioninfo::session_info()
## ─ Session info ───────────────────────────────────────────────────────────────
##  setting  value                       
##  version  R version 3.6.2 (2019-12-12)
##  os       macOS Sierra 10.12.6        
##  system   x86_64, darwin15.6.0        
##  ui       X11                         
##  language (EN)                        
##  collate  en_US.UTF-8                 
##  ctype    en_US.UTF-8                 
##  tz       America/Denver              
##  date     2020-05-14                  
## 
## ─ Packages ───────────────────────────────────────────────────────────────────
##  package     * version date       lib source        
##  assertthat    0.2.1   2019-03-21 [1] CRAN (R 3.6.0)
##  backports     1.1.6   2020-04-05 [1] CRAN (R 3.6.2)
##  broom         0.5.6   2020-04-20 [1] CRAN (R 3.6.2)
##  cellranger    1.1.0   2016-07-27 [1] CRAN (R 3.6.0)
##  cli           2.0.2   2020-02-28 [1] CRAN (R 3.6.0)
##  colorspace    1.4-1   2019-03-18 [1] CRAN (R 3.6.0)
##  crayon        1.3.4   2017-09-16 [1] CRAN (R 3.6.0)
##  DBI           1.1.0   2019-12-15 [1] CRAN (R 3.6.0)
##  dbplyr        1.4.3   2020-04-19 [1] CRAN (R 3.6.2)
##  digest        0.6.25  2020-02-23 [1] CRAN (R 3.6.0)
##  dplyr       * 0.8.5   2020-03-07 [1] CRAN (R 3.6.0)
##  ellipsis      0.3.0   2019-09-20 [1] CRAN (R 3.6.0)
##  evaluate      0.14    2019-05-28 [1] CRAN (R 3.6.0)
##  fansi         0.4.1   2020-01-08 [1] CRAN (R 3.6.0)
##  forcats     * 0.5.0   2020-03-01 [1] CRAN (R 3.6.0)
##  fs            1.4.1   2020-04-04 [1] CRAN (R 3.6.2)
##  generics      0.0.2   2018-11-29 [1] CRAN (R 3.6.0)
##  ggplot2     * 3.3.0   2020-03-05 [1] CRAN (R 3.6.0)
##  glue          1.4.0   2020-04-03 [1] CRAN (R 3.6.2)
##  gtable        0.3.0   2019-03-25 [1] CRAN (R 3.6.0)
##  haven         2.2.0   2019-11-08 [1] CRAN (R 3.6.0)
##  hms           0.5.3   2020-01-08 [1] CRAN (R 3.6.0)
##  htmltools     0.4.0   2019-10-04 [1] CRAN (R 3.6.0)
##  httr          1.4.1   2019-08-05 [1] CRAN (R 3.6.0)
##  jsonlite      1.6.1   2020-02-02 [1] CRAN (R 3.6.0)
##  knitr         1.28    2020-02-06 [1] CRAN (R 3.6.0)
##  lattice       0.20-38 2018-11-04 [2] CRAN (R 3.6.2)
##  lifecycle     0.2.0   2020-03-06 [1] CRAN (R 3.6.0)
##  lubridate   * 1.7.8   2020-04-06 [1] CRAN (R 3.6.2)
##  magrittr      1.5     2014-11-22 [1] CRAN (R 3.6.0)
##  modelr        0.1.6   2020-02-22 [1] CRAN (R 3.6.0)
##  munsell       0.5.0   2018-06-12 [1] CRAN (R 3.6.0)
##  nlme          3.1-142 2019-11-07 [2] CRAN (R 3.6.2)
##  pillar        1.4.3   2019-12-20 [1] CRAN (R 3.6.0)
##  pkgconfig     2.0.3   2019-09-22 [1] CRAN (R 3.6.0)
##  purrr       * 0.3.4   2020-04-17 [1] CRAN (R 3.6.2)
##  R6            2.4.1   2019-11-12 [1] CRAN (R 3.6.0)
##  Rcpp          1.0.4   2020-03-17 [1] CRAN (R 3.6.0)
##  readr       * 1.3.1   2018-12-21 [1] CRAN (R 3.6.0)
##  readxl        1.3.1   2019-03-13 [1] CRAN (R 3.6.0)
##  reprex        0.3.0   2019-05-16 [1] CRAN (R 3.6.0)
##  rlang         0.4.5   2020-03-01 [1] CRAN (R 3.6.0)
##  rmarkdown     2.1     2020-01-20 [1] CRAN (R 3.6.0)
##  rstudioapi    0.11    2020-02-07 [1] CRAN (R 3.6.0)
##  rvest         0.3.5   2019-11-08 [1] CRAN (R 3.6.0)
##  scales        1.1.0   2019-11-18 [1] CRAN (R 3.6.0)
##  sessioninfo   1.1.1   2018-11-05 [1] CRAN (R 3.6.0)
##  stringi       1.4.6   2020-02-17 [1] CRAN (R 3.6.0)
##  stringr     * 1.4.0   2019-02-10 [1] CRAN (R 3.6.0)
##  tibble      * 3.0.1   2020-04-20 [1] CRAN (R 3.6.2)
##  tidyr       * 1.0.2   2020-01-24 [1] CRAN (R 3.6.0)
##  tidyselect    1.0.0   2020-01-27 [1] CRAN (R 3.6.0)
##  tidyverse   * 1.3.0   2019-11-21 [1] CRAN (R 3.6.0)
##  vctrs         0.2.4   2020-03-10 [1] CRAN (R 3.6.0)
##  withr         2.2.0   2020-04-20 [1] CRAN (R 3.6.2)
##  xfun          0.13    2020-04-13 [1] CRAN (R 3.6.2)
##  xml2          1.3.2   2020-04-23 [1] CRAN (R 3.6.2)
##  yaml          2.2.1   2020-02-01 [1] CRAN (R 3.6.0)
## 
## [1] /Users/eriq/Library/R/3.6/library
## [2] /Library/Frameworks/R.framework/Versions/3.6/Resources/library

Running Time

Running the code and rendering this notebook required approximately this much time on a Mac laptop of middling speed:

Sys.time() - start_time
## Time difference of 1.529885 secs