implements the generalized Allele-Based genotyping error model — general_allele_based_geno_err

This is the version into which you pass in your own matrices of allele call probabilities W (corresponding to omega) and Ws (omega-star) and vectors of allele-specific dropout rates. Notation follows the paper for the most part. So see it for details. Note that this is vectorized over g, gp, h, and hp. Note that g, gp, h, and hp can be character vectors giving the allele names, so long as the vector D has names which are the allele names and W and Ws have rownames and colnames which are the allele names.

general_allele_based_geno_err_model(g, gp, h, hp, D, W, Ws)

Arguments

g: allelic type of gene copy g in the true genotype.
gp: allelic type of gene copy g' in the true genotype
h: allelic type of one of the gene copies of the observed genotyped
hp: allelic type of the other gene copy of the observed genotyped
D: vector of allele-specific dropout rates (delta) in the paper. Can be named with the names of the alleles.
W: the matrix omega of allelic call probabilities. Can have rownames and colnames that are the allele names.
Ws: the matrix omega* of allelic call probabilities

Value

This function returns a vector that is in the order of the genotypes as they are listed according to g, gp, h, and hp.

Examples

# let's fabricate a locus with 5 alleles named "a1" through "a5"
# and make W random, but with a much higher chance of observing the
# correct allele.
set.seed(5)
alles <- paste("a", 1:5, sep = "")

# here is a data frame of all possible genotypes
pg <- expand.grid(g=alles, gp=alles, stringsAsFactors = FALSE) %>%
  dplyr::filter(g<=gp)

# and then we replicate that to get all possible combinations of true
# genotypes to observed genotypes
input_df <- cbind(pg[rep(1:nrow(pg), nrow(pg)), ],
                  pg[rep(1:nrow(pg), each = nrow(pg)), ])
names(input_df)[3:4] <- c("h", "hp")
input_df <- tibble::as_tibble(input_df)


# here we make the matrix W (just some random numbers)
W <- runif(length(alles)^2, 0.001, 0.02) %>%
  matrix(nrow = length(alles))
dimnames(W) <- list(alles, alles)
diag(W) <- diag(W) + 1  # make it most likely to call the right allele
W <- t(apply(W, 1, function(x) x/sum(x)))  # normalize them to be probablities

# make Wstar to be similar but less likely to miscall
Ws <- W
diag(Ws) <- 2
Ws <- t(apply(Ws, 1, function(x) x/sum(x)))

# finally make a random D
D <- runif(length(alles), min = 0.001, max = 0.05)
names(D) <- alles

# then compute the probs
gprobs <- input_df %>%
 dplyr::mutate(prob = general_allele_based_geno_err_model(g, gp, h, hp, D, W, Ws))

# now, confirm that, for each true genotype, when we sum over the
# probs of the observed genotypes, we get 1.
gprobs %>%
 dplyr::mutate(ggp = paste(g, gp, sep = "-")) %>%
  dplyr::group_by(ggp) %>%
  dplyr::summarise(probSum = sum(prob))
#> # A tibble: 15 × 2
#>    ggp   probSum
#>    <chr>   <dbl>
#>  1 a1-a1       1
#>  2 a1-a2       1
#>  3 a1-a3       1
#>  4 a1-a4       1
#>  5 a1-a5       1
#>  6 a2-a2       1
#>  7 a2-a3       1
#>  8 a2-a4       1
#>  9 a2-a5       1
#> 10 a3-a3       1
#> 11 a3-a4       1
#> 12 a3-a5       1
#> 13 a4-a4       1
#> 14 a4-a5       1
#> 15 a5-a5       1

# Eureka! it all checks out!