Given a collection of genomic simulation pedigrees and requests
for how many simulations should be done (in the request
input),
as well as recombination rates, this simulates the segregation
of segments down through the pedigrees
segregate(request, RR, MM = NULL)
a tibble with list columns "gpp" and "reppop".
Each element of the gpp column is a tibble giving a genomic simulation
pedigree as documented as the input for prep_gsp_for_hap_dropping()
.
Each element of the "reppop" column is a tibble with columns
index
, pop
, group
, to indicate which of the founding
populations ("A", "B", etc.) correspond to the different groups
(from the group
column in, for example, the meta data for individuals
in your genotype data set, like the data object I_meta
).
Because it is quite likely that you might wish to iterate
the segregation procedure multiple
times in a single simulation, you can specify that by doing multiple
"reps" (replicates) of the procedure. BIG NOTE: The values in the index column that you
choose must start at 1 and should be dense within. In other words, if
the max value in the index column is N, then every integer from 1 to N
must be in there.
the recombination rates in the format of the package data
the marker meta data tibble (like M_meta). If this is NULL (the default) that
is fine. If not, then it uses the order of the markers in MM to define
the levels of a chrom_f column so that we can sort the rows of the output
correctly, with respect to markers in the Genotype data frame. This will
let us more efficiently subscript the markers out of the matrix. If MM is
not present, then the function will create chrom_f
by using the
order of the chromosomes from RR. If MM is not NULL, then the function
will also check to make sure that the markers are within the extent of the
recombination rate bins, giving an error otherwise.
The output from this function is a tibble. Each row represents one segment of genetic
material amongst the sampled individuals from the genomic permutation pedigrees. The columns give
information about the provenance and destination of that segment as follows.
Each segment exists in one of the samples (samp_index
) from a sampled individual
with a ped_sample_id
on a given gpp
(the index giving the row of the request input tibble)
in a given index
within the individual. Further, it is on one of two gametes
(gamete_index
) that segregated into the individual, and it came from a certain founding
population (pop_origin
) that corresponds to the named groups in the genotype file (group_origin
).
And, of course, the segment occupies the space from start
to end
on a chromosome chrom
.
Finally, the index of the founder haplotype on the given gpp that this segement descended from is
given in rs_founder_haplotype
which is short for "rep-specific founder haplotype". This final
piece of information is crucial for segregating variation from the individuals in the Geno
file
onto these segments. Finally, the column sim_level_founder_haplo
assigns a unique index for each founder
haplotype. This is necessary because any simulation can involve multiple gpps and/or indexes of gpps,
and the founders in each of those must all be unique within a simulation. so that those haplotypes
can all, eventually, be accessed easily out of the genotype matrix.
# We construct an example here where we will request segregation
# down a GSP with two F1s and F1B backcrosses between two hypothetical
# populations, A and B.
gsp_f1f1b <- create_GSP("A", "B", F1 = TRUE, F1B = TRUE)
# We will imagine that in our marker data there are three groups
# labelled "grp1", "grp2", and "grp3", and we want to create the F1Bs with backcrossing
# only to grp3.
reppop <- tibble::tibble(
index = as.integer(c(1, 1, 2, 2)),
pop = c("A", "B", "A", "B"),
group = c("grp3", "grp1", "grp3", "grp2")
)
# combine those into a request
request <- tibble::tibble(
gpp = list(gsp_f1f1b),
reppop = list(reppop)
)
result1 <- segregate(request, RecRates)
# here we pass it some markers, too
result2 <- segregate(request, RecRates, M_meta)
result1
#> # A tibble: 52 × 14
#> chrom_f gpp index chrom ped_sample_id samp_index gamete_index
#> <fct> <int> <int> <chr> <chr> <int> <dbl>
#> 1 12 1 1 12 4 1 1
#> 2 17 1 1 17 4 1 1
#> 3 18 1 1 18 4 1 1
#> 4 12 1 1 12 4 1 2
#> 5 17 1 1 17 4 1 2
#> 6 18 1 1 18 4 1 2
#> 7 12 1 1 12 5 1 1
#> 8 17 1 1 17 5 1 1
#> 9 18 1 1 18 5 1 1
#> 10 12 1 1 12 5 1 2
#> # ℹ 42 more rows
#> # ℹ 7 more variables: gamete_segments <list>, pop_origin <chr>,
#> # rs_founder_haplo <int>, start <dbl>, end <dbl>, group_origin <chr>,
#> # sim_level_founder_haplo <int>
result2
#> # A tibble: 48 × 14
#> chrom_f gpp index chrom ped_sample_id samp_index gamete_index
#> <fct> <int> <int> <chr> <chr> <int> <dbl>
#> 1 12 1 1 12 4 1 1
#> 2 17 1 1 17 4 1 1
#> 3 18 1 1 18 4 1 1
#> 4 12 1 1 12 4 1 2
#> 5 17 1 1 17 4 1 2
#> 6 18 1 1 18 4 1 2
#> 7 12 1 1 12 5 1 1
#> 8 17 1 1 17 5 1 1
#> 9 18 1 1 18 5 1 1
#> 10 12 1 1 12 5 1 2
#> # ℹ 38 more rows
#> # ℹ 7 more variables: gamete_segments <list>, pop_origin <chr>,
#> # rs_founder_haplo <int>, start <dbl>, end <dbl>, group_origin <chr>,
#> # sim_level_founder_haplo <int>