ROSA analysis of klamath at-entry fish with fatness (non-water fraction). Conducting a model selection approach to evaluate the influence of RoSA genotype on fatness.

library(tidyverse)

load RoSA genotype data and add estuary metadata to it.

rosa_data <- read_rds("./data/101-RoSA-klamath-estuary-samples.rds")

estuary_meta <- read_csv("./data/102-Klamath-entryDATA_withNMFSID.csv")
## Warning: Missing column names filled in: 'X16' [16]
rosa_meta <- left_join(estuary_meta, rosa_data, by = c("NMFS ID" = "NMFS_DNA_ID")) %>%
  dplyr::select(-ID) %>%
  rename(NMFS_DNA_ID = "NMFS ID") %>%
  dplyr::select(NMFS_DNA_ID, rosa, everything())

Remove all samples with ANY missing data. And remove the recombinant genotypes.

rosa_stats <- rosa_meta %>%
  filter(
    !str_detect(rosa, "\\?"),
    rosa %in% c("EEEEEEEE", "HHHHHHHH", "LLLLLLLL")
  )

plot fatness as function of return date

rosa_stats %>%
  mutate(rosa_f = factor(rosa, levels = c("EEEEEEEE", "HHHHHHHH", "LLLLLLLL"))) %>%
  ggplot(., aes(x = julian, y = drywet, fill = rosa_f, colour = rosa_f)) +
  geom_point() +
  xlab(label = "Estuary sampling date") +
  ylab(label = "non-water fraction") +
  scale_fill_discrete(name = "Genotype") +
  scale_colour_discrete(name = "Genotype") +
  theme_bw() +
  scale_x_continuous(breaks = c(121, 152, 182, 213, 244, 274, 305), labels = c(
    "May-01", "June-01",
    "July-01", "Aug-01", "Sept-01", "Oct-01", "Nov-01"
  )) +
  theme(
    axis.text = element_text(size = 12),
    axis.title = element_text(size = 14), axis.text.x = element_text(angle = 45, hjust = 1)
  ) +
  guides(
    fill = FALSE,
    colour = FALSE
  ) +
  facet_grid(year ~ sex)

Start mixed effects model analysis to determine if adding RoSA as a random effect predicts Gonadosomatic Index better than just a model with estuary sampling date.

stat_data <- rosa_stats %>%
  mutate(
    sex_f = factor(sex, levels = c("f", "m")),
    year_f = factor(year, levels = c("2009", "2010")),
    rosa_f = factor(rosa, levels = c("EEEEEEEE", "HHHHHHHH", "LLLLLLLL")),
    log_est = log(julian),
    log_est2 = log(julian)^2
  ) %>%
  rename(fatness = drywet)

mixed effects model analysis to determine if adding RoSA as a random effect predicts Gonadosomatic Index better than just a model with estuary sampling date.

library(nlme)

M0 <- gls(fatness ~ 1 + log_est + sex + year_f, method = "REML", data = stat_data)

exploratory plotting to look at model assumptions.

e1 <- resid(M0, type = "normalized")
plot(e1 ~ log_est, stat_data)

plot(e1 ~ sex_f, stat_data)

plot(e1 ~ year_f, stat_data)

hist(e1, main = "Histogram of residuals") # looks ok

None of that is concerning or indicative of violations of any assumptions.

specify random effects model and begin model selection

mm1 <- lme(fatness ~ 1 + log_est + sex + year_f, random = ~ 1 | rosa_f, method = "REML", data = stat_data) # random intercept model

likelihood ratio test on random intercept vs GLS model

anova(M0, mm1)

Adding a random intercept for RoSA does not improve the model. When forcing the 3 RoSA genotypes to have the same slope (same effect of estuary date on fatness) there is no support for differing intercepts (i.e. all RoSA genotypes start with the same fatness at day 0).

Lets see what happen when a random intercept and slope for RoSA are included. A random slope would indicate that the strength of the relationship between fatness and estuary entry date is not the same among RoSA genotypes.

mm2 <- lme(fatness ~ 1 + log_est + year_f + sex, random = ~ 0 + log_est | rosa_f, method = "REML", data = stat_data)

LRT on random slope vs GLS

anova(M0, mm2)

A random slope model is not preferred over the non-random model.

Let’s see if the random slope and intercept for RoSA is the most preferred model

lme(fatness ~ 1 + log_est + sex + year_f, random = ~ 1 + log_est | rosa_f, method = "REML", data = stat_data)
## Error in lme.formula(fatness ~ 1 + log_est + sex + year_f, random = ~1 + : nlminb problem, convergence error code = 1
##   message = iteration limit reached without convergence (10)

That error is due to singularity and not a convergence limit. I increased the number of iterations to 1000 (50 is the default) and it returns the same error. The singularity is because the model is overfit OR because th variance is too damn small for the model to converge. If the model is overfit then that tells us the random slope and intercept model wouldn’t be preferred over the GLS model. Let’s multiply fatness by 10 to increase the magnitude of the variance and see if that gets us a model than converges.

stat_data <- stat_data %>%
  mutate(fat_new = fatness * 10)

mm3_new <- lme(fat_new ~ 1 + log_est + sex + year_f, random = ~ 1 + log_est | rosa_f, method = "REML", data = stat_data)

m0_new <- gls(fat_new ~ 1 + log_est + sex + year_f, method = "REML", data = stat_data)

Model selection using liklihood ratio test using the original fatness values multiplied by 10 to increase variance value and get the variance estimate to not be zero.

LRT random slope and intercept VS GLS

anova(m0_new, mm3_new)

Let’s redo the random intercept model and the random slope model really quickly with the fat_new response

mm1_new <- lme(fat_new ~ 1 + log_est + year_f + sex, random = ~ 1 | rosa_f, method = "REML", data = stat_data)

mm2_new <- lme(fat_new ~ 1 + log_est + year_f + sex, random = ~ 0 + log_est | rosa_f, method = "REML", data = stat_data)

LRT random intercept vs GLS

anova(m0_new, mm1_new)

LRT random slope vs GLS

anova(m0_new, mm2_new)

Nothing changed when using the fatness multipled by 10 in model selection. Neither the random slope, random intercept or random slope or intercept are preferred over a non-mixed effects model.

After controlling for estuary sampling date, year and sex effects there is no effect of RoSA genotype on nonwater fraction of liver (adiposity) level. None of the random effects models were preferred over the GLS model without random effects.

Model selection on the fixed effects in the GLS model and see what predictors significantly influence fatness.

M0 <- gls(fatness ~ 1 + log_est + sex + year_f, method = "ML", data = stat_data)
M1 <- gls(fatness ~ 1 + log_est + sex, method = "ML", data = stat_data)
M2 <- gls(fatness ~ 1 + log_est + year_f, method = "ML", data = stat_data)
M3 <- gls(fatness ~ 1 + sex + year_f, method = "ML", data = stat_data)

LRT on year

anova(M1, M0)

LRT on sex

anova(M2, M0)

LRT on log_est

anova(M3, M0)

F-test on full GLS

anova(M0)

All fixed effects are highly significant. log_est has the strongest singnificance with sex and year having similar significance.

Table of effect sizes for each predictor for fatness.

library(sjPlot)
tab_model(M0)
## 'r2()' does not support models of class 'gls'.
  fatness
Predictors Estimates CI p
(Intercept) 0.73 0.65 – 0.81 <0.001
log_est -0.08 -0.10 – -0.07 <0.001
sex [m] 0.03 0.03 – 0.04 <0.001
year_f [2010] 0.03 0.03 – 0.04 <0.001
Observations 502

Session Info

sessioninfo::session_info()
## ─ Session info ───────────────────────────────────────────────────────────────
##  setting  value                       
##  version  R version 3.6.2 (2019-12-12)
##  os       macOS Sierra 10.12.6        
##  system   x86_64, darwin15.6.0        
##  ui       X11                         
##  language (EN)                        
##  collate  en_US.UTF-8                 
##  ctype    en_US.UTF-8                 
##  tz       America/Denver              
##  date     2020-05-14                  
## 
## ─ Packages ───────────────────────────────────────────────────────────────────
##  package      * version  date       lib source        
##  assertthat     0.2.1    2019-03-21 [1] CRAN (R 3.6.0)
##  backports      1.1.6    2020-04-05 [1] CRAN (R 3.6.2)
##  bayestestR     0.6.0    2020-04-20 [1] CRAN (R 3.6.2)
##  boot           1.3-23   2019-07-05 [2] CRAN (R 3.6.2)
##  broom          0.5.6    2020-04-20 [1] CRAN (R 3.6.2)
##  cellranger     1.1.0    2016-07-27 [1] CRAN (R 3.6.0)
##  cli            2.0.2    2020-02-28 [1] CRAN (R 3.6.0)
##  coda           0.19-3   2019-07-05 [1] CRAN (R 3.6.0)
##  colorspace     1.4-1    2019-03-18 [1] CRAN (R 3.6.0)
##  crayon         1.3.4    2017-09-16 [1] CRAN (R 3.6.0)
##  DBI            1.1.0    2019-12-15 [1] CRAN (R 3.6.0)
##  dbplyr         1.4.3    2020-04-19 [1] CRAN (R 3.6.2)
##  digest         0.6.25   2020-02-23 [1] CRAN (R 3.6.0)
##  dplyr        * 0.8.5    2020-03-07 [1] CRAN (R 3.6.0)
##  effectsize     0.3.0    2020-04-11 [1] CRAN (R 3.6.2)
##  ellipsis       0.3.0    2019-09-20 [1] CRAN (R 3.6.0)
##  emmeans        1.4.6    2020-04-19 [1] CRAN (R 3.6.2)
##  estimability   1.3      2018-02-11 [1] CRAN (R 3.6.0)
##  evaluate       0.14     2019-05-28 [1] CRAN (R 3.6.0)
##  fansi          0.4.1    2020-01-08 [1] CRAN (R 3.6.0)
##  farver         2.0.3    2020-01-16 [1] CRAN (R 3.6.0)
##  forcats      * 0.5.0    2020-03-01 [1] CRAN (R 3.6.0)
##  fs             1.4.1    2020-04-04 [1] CRAN (R 3.6.2)
##  generics       0.0.2    2018-11-29 [1] CRAN (R 3.6.0)
##  ggeffects      0.14.3   2020-04-20 [1] CRAN (R 3.6.2)
##  ggplot2      * 3.3.0    2020-03-05 [1] CRAN (R 3.6.0)
##  glue           1.4.0    2020-04-03 [1] CRAN (R 3.6.2)
##  gtable         0.3.0    2019-03-25 [1] CRAN (R 3.6.0)
##  haven          2.2.0    2019-11-08 [1] CRAN (R 3.6.0)
##  hms            0.5.3    2020-01-08 [1] CRAN (R 3.6.0)
##  htmltools      0.4.0    2019-10-04 [1] CRAN (R 3.6.0)
##  httr           1.4.1    2019-08-05 [1] CRAN (R 3.6.0)
##  insight        0.8.3    2020-04-20 [1] CRAN (R 3.6.2)
##  jsonlite       1.6.1    2020-02-02 [1] CRAN (R 3.6.0)
##  knitr          1.28     2020-02-06 [1] CRAN (R 3.6.0)
##  labeling       0.3      2014-08-23 [1] CRAN (R 3.6.0)
##  lattice        0.20-38  2018-11-04 [2] CRAN (R 3.6.2)
##  lifecycle      0.2.0    2020-03-06 [1] CRAN (R 3.6.0)
##  lme4           1.1-23   2020-04-07 [1] CRAN (R 3.6.2)
##  lubridate      1.7.8    2020-04-06 [1] CRAN (R 3.6.2)
##  magrittr       1.5      2014-11-22 [1] CRAN (R 3.6.0)
##  MASS           7.3-51.6 2020-04-26 [1] CRAN (R 3.6.2)
##  Matrix         1.2-18   2019-11-27 [2] CRAN (R 3.6.2)
##  minqa          1.2.4    2014-10-09 [1] CRAN (R 3.6.0)
##  modelr         0.1.6    2020-02-22 [1] CRAN (R 3.6.0)
##  munsell        0.5.0    2018-06-12 [1] CRAN (R 3.6.0)
##  mvtnorm        1.1-0    2020-02-24 [1] CRAN (R 3.6.0)
##  nlme         * 3.1-142  2019-11-07 [2] CRAN (R 3.6.2)
##  nloptr         1.2.2.1  2020-03-11 [1] CRAN (R 3.6.0)
##  parameters     0.6.1    2020-04-08 [1] CRAN (R 3.6.2)
##  performance    0.4.5    2020-03-28 [1] CRAN (R 3.6.2)
##  pillar         1.4.3    2019-12-20 [1] CRAN (R 3.6.0)
##  pkgconfig      2.0.3    2019-09-22 [1] CRAN (R 3.6.0)
##  purrr        * 0.3.4    2020-04-17 [1] CRAN (R 3.6.2)
##  R6             2.4.1    2019-11-12 [1] CRAN (R 3.6.0)
##  Rcpp           1.0.4    2020-03-17 [1] CRAN (R 3.6.0)
##  readr        * 1.3.1    2018-12-21 [1] CRAN (R 3.6.0)
##  readxl         1.3.1    2019-03-13 [1] CRAN (R 3.6.0)
##  reprex         0.3.0    2019-05-16 [1] CRAN (R 3.6.0)
##  rlang          0.4.5    2020-03-01 [1] CRAN (R 3.6.0)
##  rmarkdown      2.1      2020-01-20 [1] CRAN (R 3.6.0)
##  rstudioapi     0.11     2020-02-07 [1] CRAN (R 3.6.0)
##  rvest          0.3.5    2019-11-08 [1] CRAN (R 3.6.0)
##  scales         1.1.0    2019-11-18 [1] CRAN (R 3.6.0)
##  sessioninfo    1.1.1    2018-11-05 [1] CRAN (R 3.6.0)
##  sjlabelled     1.1.3    2020-01-28 [1] CRAN (R 3.6.0)
##  sjmisc         2.8.4    2020-04-03 [1] CRAN (R 3.6.2)
##  sjPlot       * 2.8.3    2020-03-09 [1] CRAN (R 3.6.0)
##  sjstats        0.17.9   2020-02-06 [1] CRAN (R 3.6.0)
##  statmod        1.4.34   2020-02-17 [1] CRAN (R 3.6.0)
##  stringi        1.4.6    2020-02-17 [1] CRAN (R 3.6.0)
##  stringr      * 1.4.0    2019-02-10 [1] CRAN (R 3.6.0)
##  tibble       * 3.0.1    2020-04-20 [1] CRAN (R 3.6.2)
##  tidyr        * 1.0.2    2020-01-24 [1] CRAN (R 3.6.0)
##  tidyselect     1.0.0    2020-01-27 [1] CRAN (R 3.6.0)
##  tidyverse    * 1.3.0    2019-11-21 [1] CRAN (R 3.6.0)
##  vctrs          0.2.4    2020-03-10 [1] CRAN (R 3.6.0)
##  withr          2.2.0    2020-04-20 [1] CRAN (R 3.6.2)
##  xfun           0.13     2020-04-13 [1] CRAN (R 3.6.2)
##  xml2           1.3.2    2020-04-23 [1] CRAN (R 3.6.2)
##  xtable         1.8-4    2019-04-21 [1] CRAN (R 3.6.0)
##  yaml           2.2.1    2020-02-01 [1] CRAN (R 3.6.0)
## 
## [1] /Users/eriq/Library/R/3.6/library
## [2] /Library/Frameworks/R.framework/Versions/3.6/Resources/library

Running Time

Running the code and rendering this notebook required approximately this much time on a Mac laptop of middling speed:

Sys.time() - start_time
## Time difference of 3.946509 secs