This notebook does a very simple test to make sure that my segregation functions are working out correctly. I am going to assign wild and farmed totally different allelic types (1 vs 2) at all loci and then look at the fraction of different alleles amongst the simulated hybrids.
First we ensure that we load each of the functions:
library(tidyverse)
Loading tidyverse: ggplot2
Loading tidyverse: tibble
Loading tidyverse: tidyr
Loading tidyverse: readr
Loading tidyverse: purrr
Loading tidyverse: dplyr
Conflicts with tidy packages ----------------------------------------------------------------------------------------------------------
filter(): dplyr, stats
lag(): dplyr, stats
library(stringr)
library(SalarHybPower)
We also want to have some example data. We will just use the Nova Scotia data. But we will call this dat:
dat <- readRDS("intermediates/01/tidy-west.rds") %>%
mutate(pop = str_sub(id, 1, 3),
group = ifelse(pop == "AQU" | pop == "WLN", "farmed", "wild"))
Now, we will just change alleles to 1 or 2 on the basis of the group:
fake_dat <- dat %>%
mutate(allele = ifelse(group == "farmed", 1, 2))
And, of course, we won’t worry about ranking these things, we will just take all of them:
fake_markers <- fake_dat %>%
group_by(chrom, coord, variant) %>%
tally() %>%
ungroup() %>%
select(-n) %>%
mutate(epp = 0.5, selectable = TRUE)
fakeV <- fake_markers$variant
Now we can make a group of wild and a group of farmed
Wf <- fake_dat %>%
filter(group == "wild") %>%
SalarHybPower:::scramble_founder_haplotypes(., fakeV)
Ff <- fake_dat %>%
filter(group == "farmed") %>%
SalarHybPower:::scramble_founder_haplotypes(., fakeV)
Then we can make hybrids out of those and count up the number of 1’s and 2’s in each individual.
cats <- c("PureF", "PureW", "F1", "F2", "BX_wild", "BX_farmed")
names(cats) <- cats
wild_fracts <- lapply(cats, function(hyb_cat) {
Hybs <- switch(hyb_cat,
PureF = SalarHybPower:::make_pures(Ff),
PureW = SalarHybPower:::make_pures(Wf),
F1 = SalarHybPower:::make_f1s(Wf, Ff),
F2 = SalarHybPower:::make_f2s(Wf, Ff),
BX_wild = SalarHybPower:::make_bxs(Wf, Ff),
BX_farmed = SalarHybPower:::make_bxs(Ff, Wf),
stop("unknown hyb_cat: ", hyb_cat)
)
Hybs %>%
mutate(genotype_cat = ifelse(!near(hap1, hap2), 1, ifelse(near(hap1, 1), 0, 2))) %>%
mutate(genotype_cat = as.integer(genotype_cat))
})
Joining, by = "id"
Joining, by = "id"
Joining, by = c("id_A", "gam_A")
Joining, by = c("id_B", "gam_B", "variant", "chrom", "coord")
Joining, by = "id"
Joining, by = "id"
Joining, by = c("id_A", "gam_A")
Joining, by = c("id_B", "gam_B", "variant", "chrom", "coord")
Joining, by = "id"
Joining, by = "id"
Joining, by = c("id_A", "gam_A")
Joining, by = c("id_B", "gam_B", "variant", "chrom", "coord")
Joining, by = "id"
Joining, by = "id"
Joining, by = c("id_A", "gam_A")
Joining, by = c("id_B", "gam_B", "variant", "chrom", "coord")
Joining, by = "id"
Joining, by = "id"
Joining, by = c("id_A", "gam_A")
Joining, by = c("id_B", "gam_B", "variant", "chrom", "coord")
Joining, by = "id"
Joining, by = "id"
Joining, by = c("id_A", "gam_A")
Joining, by = c("id_B", "gam_B", "variant", "chrom", "coord")
Joining, by = "id"
Joining, by = "id"
Joining, by = c("id_A", "gam_A")
Joining, by = c("id_B", "gam_B", "variant", "chrom", "coord")
Joining, by = "id"
Joining, by = "id"
Joining, by = c("id_A", "gam_A")
Joining, by = c("id_B", "gam_B", "variant", "chrom", "coord")
Joining, by = "id"
Joining, by = "id"
Joining, by = c("id_A", "gam_A")
Joining, by = c("id_B", "gam_B", "variant", "chrom", "coord")
wf_df <- wild_fracts %>%
bind_rows(.id = "hyb_cat")
Now we can analyze those:
gcat_props <- wf_df %>%
group_by(hyb_cat, id, genotype_cat) %>%
tally() %>%
mutate(prop = n / sum(n)) %>%
ungroup() %>%
mutate(genotype_cat = factor(genotype_cat))
And now we should be able to make boxplots of the proportions of different types of genotypes in the individuals. 0
= both copies are from farmed, 1
= heterozygote (one farmed and one wild allele), and 2
is a locus with both gene copies from the wild population.
ggplot(gcat_props, aes(x = genotype_cat, y = prop, colour = hyb_cat)) +
geom_boxplot() +
facet_wrap(~hyb_cat)
That looks right. Now, I think it would be great to give each founder individual’s haplotype a separate allele and then run through this so that we can easily count up how many recombination breaks there are on the resulting chromosomes.
I am getting weird results with create_hybrid_dataset()
. Let’s see if we can duplicate it here:
SAR <- split_and_rank(fake_dat)
Joining, by = c("id", "pop", "group")
AllPops <- c("BDN", "CNR", "GRR", "LHR", "LPR")
create_hybrid_dataset(SAR, wild_pop = AllPops, hyb_cat = "F1", L = 5000, dir = "hyb_dir")
Joining, by = "id"
Joining, by = "id"
Joining, by = c("id_A", "gam_A")
Joining, by = c("id_B", "gam_B", "variant", "chrom", "coord")
'hyb_dir' already exists
The data here look right, but newhybs chokes on it.
Try a smaller data set:
create_hybrid_dataset(SAR, wild_pop = AllPops, hyb_cat = "F1", L = 50, dir = "hyb_dir")
Joining, by = "id"
Joining, by = "id"
Joining, by = c("id_A", "gam_A")
Joining, by = c("id_B", "gam_B", "variant", "chrom", "coord")
'hyb_dir' already exists
That gives reasonable results. So, I think we have a weird underflow problem in newhybrids, as this chokes:
create_hybrid_dataset(SAR, wild_pop = AllPops, hyb_cat = "F1", L = 2000, dir = "hyb_dir")
Joining, by = "id"
Joining, by = "id"
Joining, by = c("id_A", "gam_A")
Joining, by = c("id_B", "gam_B", "variant", "chrom", "coord")
'hyb_dir' already exists
Yep, it is the dreaded:
WARNING: i=6 >= Num=6 in IntegerFromProbsRV(). Probably only a problem while initializing the chain. i reset to 0
which occurs when we have too many markers.