library(tidyverse)
library(lubridate)
Using commercial fishery GSI dataset to estimate the number of copies of the early run haplotype in fish caught in the ocean but originating from the Klamath Basin.
This provides an estimate of how abundant the early-run haplotype is within the Klamath Basin.
To do this analysis we need a couple things.
Values for 1 and 2 are taken from Hankin and Logan (2010) A PRELIMINARY ANALYSIS OF CHINOOK SALMON CODED-WIRE TAG RECOVERY DATA FROM IRON GATE, TRINITY RIVER AND COLE RIVERS HATCHERIES, BROOD YEARS 1978-2004.
Table 4.13 has the age-structure of escapement at Trinity River hatchery for fingerling and yearling releases.
Maturation probabilites are in Table 4.8.
t48_falldata <- tibble(age = rep(c(2, 3, 4), 4), ecotype = rep("fall", 12), pop = rep(rep(c("IGH", "TRH"), each = 3), 2), rel_type = rep(c("fingerling", "yearling"), each = 6), mat_value = c(0.0315, .4808, .9327, .1029, .6447, .9362, .0087, .2385, .9348, .0363, .5629, .9407))
t48_springdata <- tibble(age = rep(c(2, 3, 4), 2), ecotype = rep("spring", 6), pop = rep(c("TRH"), 6), rel_type = rep(c("fingerling", "yearling"), each = 3), mat_value = c(0.0409, .5577, .9365, .0199, .3526, .9031))
t148_data <- bind_rows(t48_falldata, t48_springdata)
t148_data %>%
group_by(age, pop, ecotype) %>%
summarise(mat_mean = mean(mat_value)) %>%
arrange(pop, ecotype) %>%
group_by(age) %>%
summarise(mat_mean_overall = mean(mat_mean))
age-structure of TRH returns, data from table 4.13 in Hankin 2010
age_structure <- tibble(age = rep(c(3, 4, 5), 2), rel_type = rep(c("fingerling", "yearling"), each = 3), n_escape = c(7401, 2691, 93, 19133, 9465, 140)) %>%
group_by(rel_type) %>%
mutate(
n_tot = sum(n_escape),
pct_age = round(n_escape / n_tot, 2)
) %>%
group_by(age) %>%
summarise(mean_pct_age = mean(pct_age))
Definitions; [FW = freshwater, yo = years old]
P3 = proportion of 3 yo in FW P4 = proportion of 4 year olds in FW M3 = maturation rate for 3 yo M4 = maturation rate for 4 yo S3 = saltwater proportion of 3yo S4 = saltwater proportion of 4yo
S3M3 = unscaled proportion of 3 yo in FW S4M4 = unscaled proportion of 4 yo in FW
S3M3/(S3M3+S4M4) = P3 therefore S3M3 is proportional to P3 and S4M4 would be proportional to P4. Rearrange the proportionality and we get P3/M3 is proportional to S3, P4/M4 is proportional to S4
P3 <- 0.7
M3 <- 0.47
P4 <- 0.3
M4 <- 0.93
S3_unnorm <- P3 / M3
S4_unnorm <- P4 / M4
S3_norm <- round(S3_unnorm / sum(S3_unnorm, S4_unnorm), 3) # rounded to 3 digits to keep things clean
S4_norm <- round(S4_unnorm / sum(S3_unnorm, S4_unnorm), 3)
Now calculate the probability that either a 3 yo or 4 yo will enter freshwater this year.
p_migrate_3yo <- S3_norm * M3
p_migrate_4yo <- S4_norm * M4
p_migrate <- sum(p_migrate_3yo, p_migrate_4yo)
p_migrate = 0.55188 Expand the observed counts by the proportion of genotype EE/EL/LL that has escaped to FW.
Let: q = fraction of fish with genotype EE/EL/LL that have escaped to freshwater by month 0.45 = 1-p_migrate
That leads to Ngeno = ngeno/(1-0.45q)
Ngeno = the number of genotype XX fish that would have been encountered given no fish left saltwater.
rosa_data <- read_rds("./data/101-RoSA-klamath-estuary-samples.rds")
estuary_meta <- read_csv("./data/102-Klamath-entryDATA_withNMFSID.csv")
## Warning: Missing column names filled in: 'X16' [16]
estuary_data <- left_join(estuary_meta, rosa_data, by = c("NMFS ID" = "NMFS_DNA_ID")) %>%
dplyr::select(-ID) %>%
rename(NMFS_DNA_ID = "NMFS ID") %>%
dplyr::select(NMFS_DNA_ID, rosa, everything())
q_df <- estuary_data %>%
filter(!str_detect(rosa, "\\?")) %>%
filter(year == 2010) %>% # using 2010 only because there were very few heterozygotes in 2009.
mutate(
month = str_sub(month, 1, 3),
month = factor(month, levels = c("May", "Jun", "Jul", "Aug", "Sep", "Oct"))
) %>%
count(rosa, month) %>%
group_by(rosa) %>%
mutate(
n_geno = sum(n),
q = cumsum(round(n / n_geno, 3)),
denom = 1 - (q * .45),
tmp = paste0(rosa, "_", month)
)
q_df
ocean_data <- read_rds("./data/108-Klamath-EE-frequency-model-data-oceanfishery.rds")
ocean_2010 <- ocean_data %>%
mutate(cMonth = month(COLLECTION_DATE, label = TRUE)) %>%
filter(cYear == 2010) %>%
count(rosa, cMonth) %>%
mutate(tmp = paste0(rosa, "_", cMonth))
counts2expand <- q_df %>%
select(tmp, denom) %>%
left_join(ocean_2010, ., "tmp") %>%
mutate(denom = ifelse(rosa.x == "LLLLLLLL" & cMonth %in% c("May", "Jun"), 1.0, denom)) %>%
replace_na(list(denom = 0.55)) %>%
select(-rosa.y) %>%
ungroup() %>%
rename(rosa = rosa.x)
counts2010e <- counts2expand %>%
group_by(rosa) %>%
mutate(
Ngeno = n / lag(denom, 1),
Ngeno = ifelse(is.na(Ngeno), n, Ngeno)
) %>%
group_by(rosa) %>%
mutate(sumNgeno = sum(Ngeno)) %>%
ungroup() %>%
mutate(sumNfish = sum(unique(sumNgeno)))
counts2010e
Ngeno is the expanded count, the math to calculate this is n/denom lagged 1
sumNgeno is the sum of all expanded RoSA genotype counts by genotype,
sumNfish is the sum of all expanded RoSA counts
For example; EL cMonth 6 n caught = 4, denom lagged 1 = 0.99145 –> 4 / 0.99145 = 4.034495
For EL cMonth 5 there is no expansion because we assume no fish have left saltwater. Same for EE, no expansion for cMonth 5.
Frequency of the E and L haplotypes after accounting for maturation probability and fish that escape to freshwater in 2010:
counts2010e %>%
group_by(rosa) %>%
slice(1) %>%
ungroup() %>%
summarise(
E_freq = sum(sumNgeno * c(2, 1, 0)) / (sumNfish[1] * 2),
L_freq = sum(sumNgeno * c(0, 1, 2)) / (sumNfish[1] * 2)
)
Note limited sample size (n=72) and fishery duration for 2011 (samples only from july and august)
ocean_2011 <- ocean_data %>%
mutate(cMonth = month(COLLECTION_DATE, label = TRUE)) %>%
filter(cYear == 2011) %>%
count(rosa, cMonth) %>%
mutate(tmp = paste0(rosa, "_", cMonth))
counts2expand_2011 <- q_df %>%
select(tmp, denom) %>%
left_join(., ocean_2011, "tmp") %>%
mutate(denom = ifelse(rosa.x == "LLLLLLLL" & cMonth %in% c("May", "Jun"), 1.0, denom)) %>%
replace_na(list(denom = 0.55)) %>%
select(-rosa.y) %>%
ungroup() %>%
rename(rosa = rosa.x)
counts2011e <- counts2expand_2011 %>%
group_by(rosa) %>%
mutate(
Ngeno = n / lag(denom, 1),
Ngeno = ifelse(is.na(Ngeno), n, Ngeno)
) %>%
filter(!is.na(Ngeno)) %>%
mutate(sumNgeno = sum(Ngeno)) %>%
ungroup() %>%
mutate(sumNfish = sum(unique(sumNgeno)))
counts2011e
Frequency of the E and L haplotypes after accounting for maturation probability and fish that escape to freshwater in 2011
counts2011e %>%
group_by(rosa) %>%
slice(1) %>%
ungroup() %>%
summarise(
E_freq = sum(sumNgeno * c(2, 1, 0)) / (sumNfish[1] * 2),
L_freq = sum(sumNgeno * c(0, 1, 2)) / (sumNfish[1] * 2)
)
ocean_2012 <- ocean_data %>%
mutate(cMonth = month(COLLECTION_DATE, label = TRUE)) %>%
filter(cYear == 2012) %>%
count(rosa, cMonth) %>%
mutate(tmp = paste0(rosa, "_", cMonth))
counts2expand_2012 <- q_df %>%
select(tmp, denom) %>%
left_join(ocean_2012, ., "tmp") %>%
mutate(denom = ifelse(rosa.x == "LLLLLLLL" & cMonth %in% c("May", "Jun"), 1.0, denom)) %>%
replace_na(list(denom = 0.55)) %>%
select(-rosa.y) %>%
ungroup() %>%
rename(rosa = rosa.x)
counts2012e <- counts2expand_2012 %>%
group_by(rosa) %>%
mutate(
Ngeno = n / lag(denom, 1),
Ngeno = ifelse(is.na(Ngeno), n, Ngeno)
) %>%
group_by(rosa) %>%
mutate(sumNgeno = sum(Ngeno)) %>%
ungroup() %>%
mutate(sumNfish = sum(unique(sumNgeno)))
counts2012e
Frequency of the E and L haplotype after accounting for maturation probability and fish that escape to freshwater in 2012
counts2012e %>%
group_by(rosa) %>%
slice(1) %>%
ungroup() %>%
summarise(
E_freq = sum(sumNgeno * c(2, 1, 0)) / (sumNfish[1] * 2),
L_freq = sum(sumNgeno * c(0, 1, 2)) / (sumNfish[1] * 2)
)
First get the oncorrected estimates of the frequencies of the different genotypes
uncorr_geno_freqs <- ocean_data %>%
count(cYear, rosa) %>%
group_by(cYear) %>%
mutate(freq = n / sum(n))
uncorr_geno_freqs
Then add up the allele freqs from the genotypes. \(\times 1\) for each EE and \(\times 1/2\) for each heterozygote:
alle_freqs <- uncorr_geno_freqs %>%
mutate(
E_alle_freq = case_when(
rosa == "EEEEEEEE" ~ freq,
rosa == "HHHHHHHH" ~ freq * 0.5,
TRUE ~ 0.0
),
L_alle_freq = case_when(
rosa == "LLLLLLLL" ~ freq,
rosa == "HHHHHHHH" ~ freq * 0.5,
TRUE ~ 0.0
)
)
alle_freqs
And now summarise that by year and format it for easy reading.
alle_freqs %>%
group_by(cYear) %>%
summarise(
E_haplo_freq = sum(E_alle_freq),
L_haplo_freq = sum(L_alle_freq)
)
sessioninfo::session_info()
## ─ Session info ───────────────────────────────────────────────────────────────
## setting value
## version R version 3.6.2 (2019-12-12)
## os macOS Sierra 10.12.6
## system x86_64, darwin15.6.0
## ui X11
## language (EN)
## collate en_US.UTF-8
## ctype en_US.UTF-8
## tz America/Denver
## date 2020-05-14
##
## ─ Packages ───────────────────────────────────────────────────────────────────
## package * version date lib source
## assertthat 0.2.1 2019-03-21 [1] CRAN (R 3.6.0)
## backports 1.1.6 2020-04-05 [1] CRAN (R 3.6.2)
## broom 0.5.6 2020-04-20 [1] CRAN (R 3.6.2)
## cellranger 1.1.0 2016-07-27 [1] CRAN (R 3.6.0)
## cli 2.0.2 2020-02-28 [1] CRAN (R 3.6.0)
## colorspace 1.4-1 2019-03-18 [1] CRAN (R 3.6.0)
## crayon 1.3.4 2017-09-16 [1] CRAN (R 3.6.0)
## DBI 1.1.0 2019-12-15 [1] CRAN (R 3.6.0)
## dbplyr 1.4.3 2020-04-19 [1] CRAN (R 3.6.2)
## digest 0.6.25 2020-02-23 [1] CRAN (R 3.6.0)
## dplyr * 0.8.5 2020-03-07 [1] CRAN (R 3.6.0)
## ellipsis 0.3.0 2019-09-20 [1] CRAN (R 3.6.0)
## evaluate 0.14 2019-05-28 [1] CRAN (R 3.6.0)
## fansi 0.4.1 2020-01-08 [1] CRAN (R 3.6.0)
## forcats * 0.5.0 2020-03-01 [1] CRAN (R 3.6.0)
## fs 1.4.1 2020-04-04 [1] CRAN (R 3.6.2)
## generics 0.0.2 2018-11-29 [1] CRAN (R 3.6.0)
## ggplot2 * 3.3.0 2020-03-05 [1] CRAN (R 3.6.0)
## glue 1.4.0 2020-04-03 [1] CRAN (R 3.6.2)
## gtable 0.3.0 2019-03-25 [1] CRAN (R 3.6.0)
## haven 2.2.0 2019-11-08 [1] CRAN (R 3.6.0)
## hms 0.5.3 2020-01-08 [1] CRAN (R 3.6.0)
## htmltools 0.4.0 2019-10-04 [1] CRAN (R 3.6.0)
## httr 1.4.1 2019-08-05 [1] CRAN (R 3.6.0)
## jsonlite 1.6.1 2020-02-02 [1] CRAN (R 3.6.0)
## knitr 1.28 2020-02-06 [1] CRAN (R 3.6.0)
## lattice 0.20-38 2018-11-04 [2] CRAN (R 3.6.2)
## lifecycle 0.2.0 2020-03-06 [1] CRAN (R 3.6.0)
## lubridate * 1.7.8 2020-04-06 [1] CRAN (R 3.6.2)
## magrittr 1.5 2014-11-22 [1] CRAN (R 3.6.0)
## modelr 0.1.6 2020-02-22 [1] CRAN (R 3.6.0)
## munsell 0.5.0 2018-06-12 [1] CRAN (R 3.6.0)
## nlme 3.1-142 2019-11-07 [2] CRAN (R 3.6.2)
## pillar 1.4.3 2019-12-20 [1] CRAN (R 3.6.0)
## pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 3.6.0)
## purrr * 0.3.4 2020-04-17 [1] CRAN (R 3.6.2)
## R6 2.4.1 2019-11-12 [1] CRAN (R 3.6.0)
## Rcpp 1.0.4 2020-03-17 [1] CRAN (R 3.6.0)
## readr * 1.3.1 2018-12-21 [1] CRAN (R 3.6.0)
## readxl 1.3.1 2019-03-13 [1] CRAN (R 3.6.0)
## reprex 0.3.0 2019-05-16 [1] CRAN (R 3.6.0)
## rlang 0.4.5 2020-03-01 [1] CRAN (R 3.6.0)
## rmarkdown 2.1 2020-01-20 [1] CRAN (R 3.6.0)
## rstudioapi 0.11 2020-02-07 [1] CRAN (R 3.6.0)
## rvest 0.3.5 2019-11-08 [1] CRAN (R 3.6.0)
## scales 1.1.0 2019-11-18 [1] CRAN (R 3.6.0)
## sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 3.6.0)
## stringi 1.4.6 2020-02-17 [1] CRAN (R 3.6.0)
## stringr * 1.4.0 2019-02-10 [1] CRAN (R 3.6.0)
## tibble * 3.0.1 2020-04-20 [1] CRAN (R 3.6.2)
## tidyr * 1.0.2 2020-01-24 [1] CRAN (R 3.6.0)
## tidyselect 1.0.0 2020-01-27 [1] CRAN (R 3.6.0)
## tidyverse * 1.3.0 2019-11-21 [1] CRAN (R 3.6.0)
## vctrs 0.2.4 2020-03-10 [1] CRAN (R 3.6.0)
## withr 2.2.0 2020-04-20 [1] CRAN (R 3.6.2)
## xfun 0.13 2020-04-13 [1] CRAN (R 3.6.2)
## xml2 1.3.2 2020-04-23 [1] CRAN (R 3.6.2)
## yaml 2.2.1 2020-02-01 [1] CRAN (R 3.6.0)
##
## [1] /Users/eriq/Library/R/3.6/library
## [2] /Library/Frameworks/R.framework/Versions/3.6/Resources/library
Running the code and rendering this notebook required approximately this much time on a Mac laptop of middling speed:
Sys.time() - start_time
## Time difference of 1.529885 secs