Once the Y_l and Y_l_true matrices have been inserted into the list of locus computations, then it is time to simulate multilocus genotype pairs under a variety of relationships, and then compute the log probability of them assuming various relationships.

simulate_and_calc_Q(
  YL,
  reps = 10^4,
  froms = NULL,
  tos = NULL,
  df = NULL,
  pedigrees = NULL,
  forceLinkagePO = FALSE,
  miss_mask_mat = NULL,
  rando_miss_wts = NULL,
  rando_miss_n = 0
)

Arguments

YL

the list that holds the Y_l and Y_l_true matrices

reps

number of pairs to simulate for each relationship in froms. This recycles to be the same length as froms, but you can specify different numbers of reps for each of the relationships.

froms

a vector of names of the relationship IDs to simulate from. If this is NULL then the function will just simulate from all the values that have had Y_l_true matrices computed for them in the first component of YL. Genotype values get simulated from the Y_l_true values

tos

a vector of names of the relationship IDs to calculate the genotype log probs of the simulated genotypes from. Genotype log probs are calculated using the Y_l matrices. If this is NULL then the function will just compute probs for all the relationships that have had Y_l matrices computed for them in YL.

df

A data frame of the original long markers.

pedigrees

A list in the format of pedigrees that includes the pedigrees at least for each relationship in froms. If this is non-null, then the simulation will be done of the markers as physically linked using MENDEL. If it is NULL, then the pedigrees are ignored and the simulation is done assuming that the markers are all unlinked.

forceLinkagePO

Logical. If TRUE, the genotypes for the parent-offspring relationship are simulated with linkage even though it doesn't make a difference so long as the markers themselves are not in LD in the population. This is primarily useful for testing.

miss_mask_mat

A logical matrix with length(YL) columns and reps rows. The (r,c)-th is TRUE if the c-th locus should be considered missing in the r-th simulated sample. This type of specification lets the user simulate either a specific pattern of missingness, if desired, or to simulate patterns of missing data given missing data rates, etc. (Need to make another function to do that.)

rando_miss_wts

weights to be given to different loci that influence whether they will be one of the rando_miss_n missing loci in any iteration. These will be recycled (or truncated) to have length equal to the number of loci, and they will be normalized to sum to one as appropriate (so you can provide them in unnormalized form.) The idea of this is to be able to use observed rates of missingness amongst loci to mask some loci as missing. Given as a comma-delimited string in column "rando_miss_wts" in the output.

rando_miss_n

a single number less than the number of loci. Each iteration, rando_miss_n loci will be considered missing, according to the rando_miss_wts. This let's you get a sense for how well you will do, on average, with a certain number of missing loci.

Value

This returns a list with components that are the relationships that were simulated from. Inside each of those components is a list with components referring to the relationships that had genotype log probabilities calculated (the "tos"). The contents of each is a vector of length reps which are the log probability of the multilocus genotypes of that each of reps simulated pairs.

Examples

example(insert_Y_l_matrices, package = "CKMRsim")
#> 
#> in_Y__> example(insert_C_l_matrices)
#> 
#> in_C__> example(long_markers_to_X_l_list, package = "CKMRsim")
#> 
#> l___X_> data(kappas)
#> 
#> l___X_> lm_example <- long_markers_to_X_l_list(long_markers[1:18,], kappa_matrix = kappas)
#> 
#> l___X_> mh_example <- long_markers_to_X_l_list(microhaps, kappa_matrix = kappas)
#> 
#> in_C__> mh_cl_example <- insert_C_l_matrices(
#> in_C__+     mh_example,
#> in_C__+     ge_mod_assumed = ge_model_microhap1,
#> in_C__+     ge_mod_true = ge_model_microhap1
#> in_C__+ )
#> 
#> in_Y__> mh_yl_example <- insert_Y_l_matrices(mh_cl_example)
mh_sacQ_example <- simulate_and_calc_Q(
   mh_yl_example,
   10^4,
   froms = c("U", "PO"),
   tos = c("U", "PO")
   )
#> Simulating unlinked markers from Y_l_true matrices for relationship: U
#> Simulating unlinked markers from Y_l_true matrices for relationship: PO

if (FALSE) {
# and we could plot the distribution of those if we wanted to
m <- mh_sacQ_example
D <- data.frame(U = m$U$PO - m$U$U, PO = m$PO$PO - m$PO$U)
DD <- tidyr::gather(D, Relat, logL)
library(ggplot2)
ggplot(DD, aes(x = logL, fill = Relat)) + geom_density(alpha = 0.6)
}