These are some notes that Eric kept while putting this package together, mostly to remind him of the flow of everything and to put little notes in describing how everything goes together.

Marker definitions

Markers have to be defined in a data frame. Here is an example:

long_markers
## # A tibble: 53,637 × 7
##    Chrom Locus     Pos Allele LocIdx AlleIdx   Freq
##    <int> <chr>   <dbl> <chr>   <int>   <int>  <dbl>
##  1     4 chr4-1   3845 a1          1       1 0.790 
##  2     4 chr4-1   3845 a2          1       2 0.210 
##  3     4 chr4-2  71520 a1          2       1 0.485 
##  4     4 chr4-2  71520 a3          2       2 0.237 
##  5     4 chr4-2  71520 a2          2       3 0.167 
##  6     4 chr4-2  71520 a4          2       4 0.112 
##  7     4 chr4-3 105104 a3          3       1 0.499 
##  8     4 chr4-3 105104 a2          3       2 0.402 
##  9     4 chr4-3 105104 a1          3       3 0.0989
## 10     4 chr4-4 256481 a4          4       1 0.402 
## # ℹ 53,627 more rows

The data frame has to be in this format. It needs to have CHROM (which can be an int or string) Locus (chr) Pos (double or int), Allele need to be a character (even if it is an allele length). The LocIdx and AlleleIdx fields give the order of things that is desired. You can use reindex_markers to sort things into the allele frequency order and re-index them.

If you don’t know the chrom or position you can just stick whatever you want in there.

Genotyping error

There are two main types of genotyping error model, but so far I have only implemented the allele-based one: general_allele_based_geno_err_model.