This is intended for the case where the genotypes in question are composed of alleles that are actually the multi-SNP haplotypes obtained from next generation sequence data. In other words, all the SNPs occur on a single read and the phase is known because they are all together on the read. It allows for a per-locus or a per-SNP-specific sequencing error rate. The haplotypes must be named as strings of A, C, G, or T, (though they could be strings of any characters---the function isn't going to check that!) and for now we assume that if the SNPs are multiallelic then genotyping errors to either of the alternate alleles are equally likely. Currently assumes that genotyping errors are equally likely in either direction at a SNP, too.

ge_model_microhap1(
  L,
  miscall_rate = 0.005,
  dropout_rate = 0.005,
  per_snp_rate = FALSE
)

Arguments

L

an element of the list created by long_markers_to_X_l_list. Such an element basically holds the information at a single locus. The idea here is that every ge_mod_* function takes in an object like L, and then can use any piece of information in it about alleles or genotypes to configure a genotyping error model.

miscall_rate

The rate at which microhaplotype alleles are mis-called. If option per_snp_rate is TRUE, then this is the rate at which each SNPs might get miscalled, such that the pverall miscall rate for microhaplotypes with more SNPs is higher than for microhaplotypes with few SNPs.

dropout_rate

Rate of allelic dropout.

per_snp_rate

Logical. If true, then the overall mis-call rate for the microhaplotype locus is miscall_rate times the number of SNPs in the microhapotype. The default is FALSE, and it this option is not really recommended.

Examples

# here is some example information about a microhap:
example_L_microhap
#> $freqs
#>         GCA         ACA         ACG         ATA         GCG         GTA 
#> 0.543478261 0.260869565 0.086956522 0.065217391 0.038043478 0.005434783 
#> 
#> $geno_freqs
#>    GCA / GCA    GCA / ACA    GCA / ACG    GCA / ATA    GCA / GCG    GCA / GTA 
#> 2.953686e-01 2.835539e-01 9.451796e-02 7.088847e-02 4.135161e-02 5.907372e-03 
#>    ACA / ACA    ACA / ACG    ACA / ATA    ACA / GCG    ACA / GTA    ACG / ACG 
#> 6.805293e-02 4.536862e-02 3.402647e-02 1.984877e-02 2.835539e-03 7.561437e-03 
#>    ACG / ATA    ACG / GCG    ACG / GTA    ATA / ATA    ATA / GCG    ATA / GTA 
#> 1.134216e-02 6.616257e-03 9.451796e-04 4.253308e-03 4.962193e-03 7.088847e-04 
#>    GCG / GCG    GCG / GTA    GTA / GTA 
#> 1.447306e-03 4.135161e-04 2.953686e-05 
#> 

# now we can feed it in to the function with default parameter values.