simulate_and_calc_Q.Rd
Once the Y_l and Y_l_true matrices have been inserted into the list of locus computations, then it is time to simulate multilocus genotype pairs under a variety of relationships, and then compute the log probability of them assuming various relationships.
simulate_and_calc_Q(
YL,
reps = 10^4,
froms = NULL,
tos = NULL,
df = NULL,
pedigrees = NULL,
forceLinkagePO = FALSE,
miss_mask_mat = NULL,
rando_miss_wts = NULL,
rando_miss_n = 0
)
the list that holds the Y_l and Y_l_true matrices
number of pairs to simulate for each relationship in froms. This recycles to be the same length as froms, but you can specify different numbers of reps for each of the relationships.
a vector of names of the relationship IDs to simulate from. If this is NULL then the function will just simulate from all the values that have had Y_l_true matrices computed for them in the first component of YL. Genotype values get simulated from the Y_l_true values
a vector of names of the relationship IDs to calculate the genotype log probs of the simulated genotypes from. Genotype log probs are calculated using the Y_l matrices. If this is NULL then the function will just compute probs for all the relationships that have had Y_l matrices computed for them in YL.
A data frame of the original long markers.
A list in the format of pedigrees
that includes the pedigrees at least for
each relationship in froms
. If this is non-null, then the simulation will be done of the markers
as physically linked using MENDEL. If it is NULL, then the pedigrees are ignored and the simulation
is done assuming that the markers are all unlinked.
Logical. If TRUE, the genotypes for the parent-offspring relationship are simulated with linkage even though it doesn't make a difference so long as the markers themselves are not in LD in the population. This is primarily useful for testing.
A logical matrix with length(YL) columns and reps rows. The (r,c)-th is TRUE if the c-th locus should be considered missing in the r-th simulated sample. This type of specification lets the user simulate either a specific pattern of missingness, if desired, or to simulate patterns of missing data given missing data rates, etc. (Need to make another function to do that.)
weights to be given to different loci that influence whether they will be one of the rando_miss_n missing loci in any iteration. These will be recycled (or truncated) to have length equal to the number of loci, and they will be normalized to sum to one as appropriate (so you can provide them in unnormalized form.) The idea of this is to be able to use observed rates of missingness amongst loci to mask some loci as missing. Given as a comma-delimited string in column "rando_miss_wts" in the output.
a single number less than the number of loci. Each iteration, rando_miss_n loci will be considered missing, according to the rando_miss_wts. This let's you get a sense for how well you will do, on average, with a certain number of missing loci.
This returns a list with components that are the relationships that were simulated from. Inside each of those components is a list with components referring to the relationships that had genotype log probabilities calculated (the "tos"). The contents of each is a vector of length reps which are the log probability of the multilocus genotypes of that each of reps simulated pairs.
example(insert_Y_l_matrices, package = "CKMRsim")
#>
#> in_Y__> example(insert_C_l_matrices)
#>
#> in_C__> example(long_markers_to_X_l_list, package = "CKMRsim")
#>
#> l___X_> data(kappas)
#>
#> l___X_> lm_example <- long_markers_to_X_l_list(long_markers[1:18,], kappa_matrix = kappas)
#>
#> l___X_> mh_example <- long_markers_to_X_l_list(microhaps, kappa_matrix = kappas)
#>
#> in_C__> mh_cl_example <- insert_C_l_matrices(
#> in_C__+ mh_example,
#> in_C__+ ge_mod_assumed = ge_model_microhap1,
#> in_C__+ ge_mod_true = ge_model_microhap1
#> in_C__+ )
#>
#> in_Y__> mh_yl_example <- insert_Y_l_matrices(mh_cl_example)
mh_sacQ_example <- simulate_and_calc_Q(
mh_yl_example,
10^4,
froms = c("U", "PO"),
tos = c("U", "PO")
)
#> Simulating unlinked markers from Y_l_true matrices for relationship: U
#> Simulating unlinked markers from Y_l_true matrices for relationship: PO
if (FALSE) {
# and we could plot the distribution of those if we wanted to
m <- mh_sacQ_example
D <- data.frame(U = m$U$PO - m$U$U, PO = m$PO$PO - m$PO$U)
DD <- tidyr::gather(D, Relat, logL)
library(ggplot2)
ggplot(DD, aes(x = logL, fill = Relat)) + geom_density(alpha = 0.6)
}