library(tidyverse)
load RoSA genotype data and add estuary metadata to it.
rosa_data <- read_rds("./data/101-RoSA-klamath-estuary-samples.rds")
estuary_meta <- read_csv("./data/102-Klamath-entryDATA_withNMFSID.csv")
## Warning: Missing column names filled in: 'X16' [16]
rosa_meta <- left_join(estuary_meta, rosa_data, by = c("NMFS ID" = "NMFS_DNA_ID")) %>%
dplyr::select(-ID) %>%
rename(NMFS_DNA_ID = "NMFS ID") %>%
dplyr::select(NMFS_DNA_ID, rosa, everything())
Remove all samples with ANY missing data. And remove the recombinant genotypes.
rosa_stats <- rosa_meta %>%
filter(
!str_detect(rosa, "\\?"),
rosa %in% c("EEEEEEEE", "HHHHHHHH", "LLLLLLLL")
)
plot fatness as function of return date
rosa_stats %>%
mutate(rosa_f = factor(rosa, levels = c("EEEEEEEE", "HHHHHHHH", "LLLLLLLL"))) %>%
ggplot(., aes(x = julian, y = drywet, fill = rosa_f, colour = rosa_f)) +
geom_point() +
xlab(label = "Estuary sampling date") +
ylab(label = "non-water fraction") +
scale_fill_discrete(name = "Genotype") +
scale_colour_discrete(name = "Genotype") +
theme_bw() +
scale_x_continuous(breaks = c(121, 152, 182, 213, 244, 274, 305), labels = c(
"May-01", "June-01",
"July-01", "Aug-01", "Sept-01", "Oct-01", "Nov-01"
)) +
theme(
axis.text = element_text(size = 12),
axis.title = element_text(size = 14), axis.text.x = element_text(angle = 45, hjust = 1)
) +
guides(
fill = FALSE,
colour = FALSE
) +
facet_grid(year ~ sex)
stat_data <- rosa_stats %>%
mutate(
sex_f = factor(sex, levels = c("f", "m")),
year_f = factor(year, levels = c("2009", "2010")),
rosa_f = factor(rosa, levels = c("EEEEEEEE", "HHHHHHHH", "LLLLLLLL")),
log_est = log(julian),
log_est2 = log(julian)^2
) %>%
rename(fatness = drywet)
library(nlme)
M0 <- gls(fatness ~ 1 + log_est + sex + year_f, method = "REML", data = stat_data)
e1 <- resid(M0, type = "normalized")
plot(e1 ~ log_est, stat_data)
plot(e1 ~ sex_f, stat_data)
plot(e1 ~ year_f, stat_data)
hist(e1, main = "Histogram of residuals") # looks ok
None of that is concerning or indicative of violations of any assumptions.
mm1 <- lme(fatness ~ 1 + log_est + sex + year_f, random = ~ 1 | rosa_f, method = "REML", data = stat_data) # random intercept model
likelihood ratio test on random intercept vs GLS model
anova(M0, mm1)
Adding a random intercept for RoSA does not improve the model. When forcing the 3 RoSA genotypes to have the same slope (same effect of estuary date on fatness) there is no support for differing intercepts (i.e. all RoSA genotypes start with the same fatness at day 0).
Lets see what happen when a random intercept and slope for RoSA are included. A random slope would indicate that the strength of the relationship between fatness and estuary entry date is not the same among RoSA genotypes.
mm2 <- lme(fatness ~ 1 + log_est + year_f + sex, random = ~ 0 + log_est | rosa_f, method = "REML", data = stat_data)
LRT on random slope vs GLS
anova(M0, mm2)
A random slope model is not preferred over the non-random model.
Let’s see if the random slope and intercept for RoSA is the most preferred model
lme(fatness ~ 1 + log_est + sex + year_f, random = ~ 1 + log_est | rosa_f, method = "REML", data = stat_data)
## Error in lme.formula(fatness ~ 1 + log_est + sex + year_f, random = ~1 + : nlminb problem, convergence error code = 1
## message = iteration limit reached without convergence (10)
That error is due to singularity and not a convergence limit. I increased the number of iterations to 1000 (50 is the default) and it returns the same error. The singularity is because the model is overfit OR because th variance is too damn small for the model to converge. If the model is overfit then that tells us the random slope and intercept model wouldn’t be preferred over the GLS model. Let’s multiply fatness by 10 to increase the magnitude of the variance and see if that gets us a model than converges.
stat_data <- stat_data %>%
mutate(fat_new = fatness * 10)
mm3_new <- lme(fat_new ~ 1 + log_est + sex + year_f, random = ~ 1 + log_est | rosa_f, method = "REML", data = stat_data)
m0_new <- gls(fat_new ~ 1 + log_est + sex + year_f, method = "REML", data = stat_data)
Model selection using liklihood ratio test using the original fatness values multiplied by 10 to increase variance value and get the variance estimate to not be zero.
LRT random slope and intercept VS GLS
anova(m0_new, mm3_new)
Let’s redo the random intercept model and the random slope model really quickly with the fat_new response
mm1_new <- lme(fat_new ~ 1 + log_est + year_f + sex, random = ~ 1 | rosa_f, method = "REML", data = stat_data)
mm2_new <- lme(fat_new ~ 1 + log_est + year_f + sex, random = ~ 0 + log_est | rosa_f, method = "REML", data = stat_data)
LRT random intercept vs GLS
anova(m0_new, mm1_new)
LRT random slope vs GLS
anova(m0_new, mm2_new)
Nothing changed when using the fatness multipled by 10 in model selection. Neither the random slope, random intercept or random slope or intercept are preferred over a non-mixed effects model.
M0 <- gls(fatness ~ 1 + log_est + sex + year_f, method = "ML", data = stat_data)
M1 <- gls(fatness ~ 1 + log_est + sex, method = "ML", data = stat_data)
M2 <- gls(fatness ~ 1 + log_est + year_f, method = "ML", data = stat_data)
M3 <- gls(fatness ~ 1 + sex + year_f, method = "ML", data = stat_data)
LRT on year
anova(M1, M0)
LRT on sex
anova(M2, M0)
LRT on log_est
anova(M3, M0)
F-test on full GLS
anova(M0)
All fixed effects are highly significant. log_est has the strongest singnificance with sex and year having similar significance.
Table of effect sizes for each predictor for fatness.
library(sjPlot)
tab_model(M0)
## 'r2()' does not support models of class 'gls'.
fatness | |||
---|---|---|---|
Predictors | Estimates | CI | p |
(Intercept) | 0.73 | 0.65 – 0.81 | <0.001 |
log_est | -0.08 | -0.10 – -0.07 | <0.001 |
sex [m] | 0.03 | 0.03 – 0.04 | <0.001 |
year_f [2010] | 0.03 | 0.03 – 0.04 | <0.001 |
Observations | 502 |
sessioninfo::session_info()
## ─ Session info ───────────────────────────────────────────────────────────────
## setting value
## version R version 3.6.2 (2019-12-12)
## os macOS Sierra 10.12.6
## system x86_64, darwin15.6.0
## ui X11
## language (EN)
## collate en_US.UTF-8
## ctype en_US.UTF-8
## tz America/Denver
## date 2020-05-14
##
## ─ Packages ───────────────────────────────────────────────────────────────────
## package * version date lib source
## assertthat 0.2.1 2019-03-21 [1] CRAN (R 3.6.0)
## backports 1.1.6 2020-04-05 [1] CRAN (R 3.6.2)
## bayestestR 0.6.0 2020-04-20 [1] CRAN (R 3.6.2)
## boot 1.3-23 2019-07-05 [2] CRAN (R 3.6.2)
## broom 0.5.6 2020-04-20 [1] CRAN (R 3.6.2)
## cellranger 1.1.0 2016-07-27 [1] CRAN (R 3.6.0)
## cli 2.0.2 2020-02-28 [1] CRAN (R 3.6.0)
## coda 0.19-3 2019-07-05 [1] CRAN (R 3.6.0)
## colorspace 1.4-1 2019-03-18 [1] CRAN (R 3.6.0)
## crayon 1.3.4 2017-09-16 [1] CRAN (R 3.6.0)
## DBI 1.1.0 2019-12-15 [1] CRAN (R 3.6.0)
## dbplyr 1.4.3 2020-04-19 [1] CRAN (R 3.6.2)
## digest 0.6.25 2020-02-23 [1] CRAN (R 3.6.0)
## dplyr * 0.8.5 2020-03-07 [1] CRAN (R 3.6.0)
## effectsize 0.3.0 2020-04-11 [1] CRAN (R 3.6.2)
## ellipsis 0.3.0 2019-09-20 [1] CRAN (R 3.6.0)
## emmeans 1.4.6 2020-04-19 [1] CRAN (R 3.6.2)
## estimability 1.3 2018-02-11 [1] CRAN (R 3.6.0)
## evaluate 0.14 2019-05-28 [1] CRAN (R 3.6.0)
## fansi 0.4.1 2020-01-08 [1] CRAN (R 3.6.0)
## farver 2.0.3 2020-01-16 [1] CRAN (R 3.6.0)
## forcats * 0.5.0 2020-03-01 [1] CRAN (R 3.6.0)
## fs 1.4.1 2020-04-04 [1] CRAN (R 3.6.2)
## generics 0.0.2 2018-11-29 [1] CRAN (R 3.6.0)
## ggeffects 0.14.3 2020-04-20 [1] CRAN (R 3.6.2)
## ggplot2 * 3.3.0 2020-03-05 [1] CRAN (R 3.6.0)
## glue 1.4.0 2020-04-03 [1] CRAN (R 3.6.2)
## gtable 0.3.0 2019-03-25 [1] CRAN (R 3.6.0)
## haven 2.2.0 2019-11-08 [1] CRAN (R 3.6.0)
## hms 0.5.3 2020-01-08 [1] CRAN (R 3.6.0)
## htmltools 0.4.0 2019-10-04 [1] CRAN (R 3.6.0)
## httr 1.4.1 2019-08-05 [1] CRAN (R 3.6.0)
## insight 0.8.3 2020-04-20 [1] CRAN (R 3.6.2)
## jsonlite 1.6.1 2020-02-02 [1] CRAN (R 3.6.0)
## knitr 1.28 2020-02-06 [1] CRAN (R 3.6.0)
## labeling 0.3 2014-08-23 [1] CRAN (R 3.6.0)
## lattice 0.20-38 2018-11-04 [2] CRAN (R 3.6.2)
## lifecycle 0.2.0 2020-03-06 [1] CRAN (R 3.6.0)
## lme4 1.1-23 2020-04-07 [1] CRAN (R 3.6.2)
## lubridate 1.7.8 2020-04-06 [1] CRAN (R 3.6.2)
## magrittr 1.5 2014-11-22 [1] CRAN (R 3.6.0)
## MASS 7.3-51.6 2020-04-26 [1] CRAN (R 3.6.2)
## Matrix 1.2-18 2019-11-27 [2] CRAN (R 3.6.2)
## minqa 1.2.4 2014-10-09 [1] CRAN (R 3.6.0)
## modelr 0.1.6 2020-02-22 [1] CRAN (R 3.6.0)
## munsell 0.5.0 2018-06-12 [1] CRAN (R 3.6.0)
## mvtnorm 1.1-0 2020-02-24 [1] CRAN (R 3.6.0)
## nlme * 3.1-142 2019-11-07 [2] CRAN (R 3.6.2)
## nloptr 1.2.2.1 2020-03-11 [1] CRAN (R 3.6.0)
## parameters 0.6.1 2020-04-08 [1] CRAN (R 3.6.2)
## performance 0.4.5 2020-03-28 [1] CRAN (R 3.6.2)
## pillar 1.4.3 2019-12-20 [1] CRAN (R 3.6.0)
## pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 3.6.0)
## purrr * 0.3.4 2020-04-17 [1] CRAN (R 3.6.2)
## R6 2.4.1 2019-11-12 [1] CRAN (R 3.6.0)
## Rcpp 1.0.4 2020-03-17 [1] CRAN (R 3.6.0)
## readr * 1.3.1 2018-12-21 [1] CRAN (R 3.6.0)
## readxl 1.3.1 2019-03-13 [1] CRAN (R 3.6.0)
## reprex 0.3.0 2019-05-16 [1] CRAN (R 3.6.0)
## rlang 0.4.5 2020-03-01 [1] CRAN (R 3.6.0)
## rmarkdown 2.1 2020-01-20 [1] CRAN (R 3.6.0)
## rstudioapi 0.11 2020-02-07 [1] CRAN (R 3.6.0)
## rvest 0.3.5 2019-11-08 [1] CRAN (R 3.6.0)
## scales 1.1.0 2019-11-18 [1] CRAN (R 3.6.0)
## sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 3.6.0)
## sjlabelled 1.1.3 2020-01-28 [1] CRAN (R 3.6.0)
## sjmisc 2.8.4 2020-04-03 [1] CRAN (R 3.6.2)
## sjPlot * 2.8.3 2020-03-09 [1] CRAN (R 3.6.0)
## sjstats 0.17.9 2020-02-06 [1] CRAN (R 3.6.0)
## statmod 1.4.34 2020-02-17 [1] CRAN (R 3.6.0)
## stringi 1.4.6 2020-02-17 [1] CRAN (R 3.6.0)
## stringr * 1.4.0 2019-02-10 [1] CRAN (R 3.6.0)
## tibble * 3.0.1 2020-04-20 [1] CRAN (R 3.6.2)
## tidyr * 1.0.2 2020-01-24 [1] CRAN (R 3.6.0)
## tidyselect 1.0.0 2020-01-27 [1] CRAN (R 3.6.0)
## tidyverse * 1.3.0 2019-11-21 [1] CRAN (R 3.6.0)
## vctrs 0.2.4 2020-03-10 [1] CRAN (R 3.6.0)
## withr 2.2.0 2020-04-20 [1] CRAN (R 3.6.2)
## xfun 0.13 2020-04-13 [1] CRAN (R 3.6.2)
## xml2 1.3.2 2020-04-23 [1] CRAN (R 3.6.2)
## xtable 1.8-4 2019-04-21 [1] CRAN (R 3.6.0)
## yaml 2.2.1 2020-02-01 [1] CRAN (R 3.6.0)
##
## [1] /Users/eriq/Library/R/3.6/library
## [2] /Library/Frameworks/R.framework/Versions/3.6/Resources/library
Running the code and rendering this notebook required approximately this much time on a Mac laptop of middling speed:
Sys.time() - start_time
## Time difference of 3.946509 secs