ge_model_microhap1.Rd
This is intended for the case where the genotypes in question are composed of alleles that are actually the multi-SNP haplotypes obtained from next generation sequence data. In other words, all the SNPs occur on a single read and the phase is known because they are all together on the read. It allows for a per-locus or a per-SNP-specific sequencing error rate. The haplotypes must be named as strings of A, C, G, or T, (though they could be strings of any characters---the function isn't going to check that!) and for now we assume that if the SNPs are multiallelic then genotyping errors to either of the alternate alleles are equally likely. Currently assumes that genotyping errors are equally likely in either direction at a SNP, too.
ge_model_microhap1(
L,
miscall_rate = 0.005,
dropout_rate = 0.005,
per_snp_rate = FALSE
)
an element of the list created by long_markers_to_X_l_list
. Such
an element basically holds the information at a single locus. The idea here
is that every ge_mod_* function takes in an object like L, and then can
use any piece of information in it about alleles or genotypes to
configure a genotyping error model.
The rate at which microhaplotype alleles are mis-called. If
option per_snp_rate
is TRUE, then this is the rate at which each SNPs might
get miscalled, such that the pverall miscall rate for microhaplotypes with more SNPs
is higher than for microhaplotypes with few SNPs.
Rate of allelic dropout.
Logical. If true, then the overall mis-call rate for the
microhaplotype locus is miscall_rate
times the number of SNPs in the microhapotype.
The default is FALSE, and it this option is not really recommended.
# here is some example information about a microhap:
example_L_microhap
#> $freqs
#> GCA ACA ACG ATA GCG GTA
#> 0.543478261 0.260869565 0.086956522 0.065217391 0.038043478 0.005434783
#>
#> $geno_freqs
#> GCA / GCA GCA / ACA GCA / ACG GCA / ATA GCA / GCG GCA / GTA
#> 2.953686e-01 2.835539e-01 9.451796e-02 7.088847e-02 4.135161e-02 5.907372e-03
#> ACA / ACA ACA / ACG ACA / ATA ACA / GCG ACA / GTA ACG / ACG
#> 6.805293e-02 4.536862e-02 3.402647e-02 1.984877e-02 2.835539e-03 7.561437e-03
#> ACG / ATA ACG / GCG ACG / GTA ATA / ATA ATA / GCG ATA / GTA
#> 1.134216e-02 6.616257e-03 9.451796e-04 4.253308e-03 4.962193e-03 7.088847e-04
#> GCG / GCG GCG / GTA GTA / GTA
#> 1.447306e-03 4.135161e-04 2.953686e-05
#>
# now we can feed it in to the function with default parameter values.