Given a collection of genomic simulation pedigrees and requests for how many simulations should be done (in the request input), as well as recombination rates, this simulates the segregation of segments down through the pedigrees

segregate(request, RR, MM = NULL)

Arguments

request

a tibble with list columns "gpp" and "reppop". Each element of the gpp column is a tibble giving a genomic simulation pedigree as documented as the input for prep_gsp_for_hap_dropping(). Each element of the "reppop" column is a tibble with columns index, pop, group, to indicate which of the founding populations ("A", "B", etc.) correspond to the different groups (from the group column in, for example, the meta data for individuals in your genotype data set, like the data object I_meta). Because it is quite likely that you might wish to iterate the segregation procedure multiple times in a single simulation, you can specify that by doing multiple "reps" (replicates) of the procedure. BIG NOTE: The values in the index column that you choose must start at 1 and should be dense within. In other words, if the max value in the index column is N, then every integer from 1 to N must be in there.

RR

the recombination rates in the format of the package data

MM

the marker meta data tibble (like M_meta). If this is NULL (the default) that is fine. If not, then it uses the order of the markers in MM to define the levels of a chrom_f column so that we can sort the rows of the output correctly, with respect to markers in the Genotype data frame. This will let us more efficiently subscript the markers out of the matrix. If MM is not present, then the function will create chrom_f by using the order of the chromosomes from RR. If MM is not NULL, then the function will also check to make sure that the markers are within the extent of the recombination rate bins, giving an error otherwise.

Value

The output from this function is a tibble. Each row represents one segment of genetic material amongst the sampled individuals from the genomic permutation pedigrees. The columns give information about the provenance and destination of that segment as follows. Each segment exists in one of the samples (samp_index) from a sampled individual with a ped_sample_id on a given gpp (the index giving the row of the request input tibble) in a given index within the individual. Further, it is on one of two gametes (gamete_index) that segregated into the individual, and it came from a certain founding population (pop_origin) that corresponds to the named groups in the genotype file (group_origin). And, of course, the segment occupies the space from start to end on a chromosome chrom. Finally, the index of the founder haplotype on the given gpp that this segement descended from is given in rs_founder_haplotype which is short for "rep-specific founder haplotype". This final piece of information is crucial for segregating variation from the individuals in the Geno file onto these segments. Finally, the column sim_level_founder_haplo assigns a unique index for each founder haplotype. This is necessary because any simulation can involve multiple gpps and/or indexes of gpps, and the founders in each of those must all be unique within a simulation. so that those haplotypes can all, eventually, be accessed easily out of the genotype matrix.

Examples

# We construct an example here where we will request segregation
# down a GSP with two F1s and F1B backcrosses between two hypothetical
# populations, A and B.
gsp_f1f1b <- create_GSP("A", "B", F1 = TRUE, F1B = TRUE)

# We will imagine that in our marker data there are three groups
# labelled "grp1", "grp2", and "grp3", and we want to create the F1Bs with backcrossing
# only to grp3.
reppop <- tibble::tibble(
    index = as.integer(c(1, 1, 2, 2)),
    pop = c("A", "B", "A", "B"),
    group = c("grp3", "grp1", "grp3", "grp2")
)

# combine those into a request
request <- tibble::tibble(
   gpp = list(gsp_f1f1b),
   reppop = list(reppop)
)


result1 <- segregate(request, RecRates)

# here we pass it some markers, too
result2 <- segregate(request, RecRates, M_meta)

result1
#> # A tibble: 52 × 14
#>    chrom_f   gpp index chrom ped_sample_id samp_index gamete_index
#>    <fct>   <int> <int> <chr> <chr>              <int>        <dbl>
#>  1 12          1     1 12    4                      1            1
#>  2 17          1     1 17    4                      1            1
#>  3 18          1     1 18    4                      1            1
#>  4 12          1     1 12    4                      1            2
#>  5 17          1     1 17    4                      1            2
#>  6 18          1     1 18    4                      1            2
#>  7 12          1     1 12    5                      1            1
#>  8 17          1     1 17    5                      1            1
#>  9 18          1     1 18    5                      1            1
#> 10 12          1     1 12    5                      1            2
#> # ℹ 42 more rows
#> # ℹ 7 more variables: gamete_segments <list>, pop_origin <chr>,
#> #   rs_founder_haplo <int>, start <dbl>, end <dbl>, group_origin <chr>,
#> #   sim_level_founder_haplo <int>

result2
#> # A tibble: 48 × 14
#>    chrom_f   gpp index chrom ped_sample_id samp_index gamete_index
#>    <fct>   <int> <int> <chr> <chr>              <int>        <dbl>
#>  1 12          1     1 12    4                      1            1
#>  2 17          1     1 17    4                      1            1
#>  3 18          1     1 18    4                      1            1
#>  4 12          1     1 12    4                      1            2
#>  5 17          1     1 17    4                      1            2
#>  6 18          1     1 18    4                      1            2
#>  7 12          1     1 12    5                      1            1
#>  8 17          1     1 17    5                      1            1
#>  9 18          1     1 18    5                      1            1
#> 10 12          1     1 12    5                      1            2
#> # ℹ 38 more rows
#> # ℹ 7 more variables: gamete_segments <list>, pop_origin <chr>,
#> #   rs_founder_haplo <int>, start <dbl>, end <dbl>, group_origin <chr>,
#> #   sim_level_founder_haplo <int>