Collect essential data values before mixture proportion estimation
Source:R/data_conversion.R
list_diploid_params.RdTakes all relevant information created in previous steps of data conversion pipeline, and combines into a single list which serves as input for further calculations
Usage
list_diploid_params(
AC_list,
I_list,
PO,
coll_N,
RU_vec,
RU_starts,
alle_freq_prior = list(const_scaled = 1)
)Arguments
- AC_list
a list of allele count matrices; output from
a_freq_list- I_list
a list of genotype vectors; output from
allelic_list- PO
a vector of collection (population of origin) indices for every individual in the sample, in order identical to
I_list- coll_N
a vector of the total number of individuals in each collection, in order of appearance in the dataset
- RU_vec
a vector of collection indices, sorted by reporting unit
- RU_starts
a vector of indices, designating the first collection for each reporting unit in RU_vec
- alle_freq_prior
a one-element named list specifying the prior to be used when generating Dirichlet parameters for genotype likelihood calculations. The name of the list item determines the type of prior used, with options
"const","scaled_const", and"empirical". If"const", the listed number will be taken as a constant added to the count for each allele, locus, and collection. If"scaled_const", the listed number will be divided by the number of alleles at a locus, then added to the allele counts. If"empirical", the listed number will be multiplied by the relative frequency of each allele across all populations, then added to the allele counts.
Value
list_diploid_params returns a list of the information necessary
for the calculation of genotype likelihoods in MCMC:
L, N, and C represent the number of loci, individual genotypes,
and collections, respectively. A is a vector of the number of alleles at each
locus, and CA is the cumulative sum of A. coll, coll_N,
RU_vec, and RU_starts are copied directly from input.
I, AC, sum_AC, DP, and sum_DP are vectorized
versions of data previously represented as lists and matrices; indexing macros
use L, N, C, A, and CA to access these vectors
in later Rcpp-based calculations.
Details
Genotypes represented in I_list are converted into a single long vector,
ordered by locus, individual, and gene copy, with NA values represented as 0s.
Similarly, AC_list is unlisted to AC, ordered by locus, collection,
and allele. DP is a list of Dirichlet priors for likelihood calculations, created
by adding the values calculated from alle_freq_prior to each allele
sum_AC and sum_DP are the summed allele values for each locus
of their parent vectors, ordered by locus and collection.
Examples
example(allelic_list)
#>
#> alllc_> example(a_freq_list)
#>
#> a_frq_> # Generate a list of individual genotypes by allele from
#> a_frq_> # the alewife data's reference allele count tables
#> a_frq_> example(reference_allele_counts)
#>
#> rfrn__> ## count alleles in alewife reference populations
#> rfrn__> example(tcf2long) # gets variable ale_long
#>
#> tcf2ln> ## Convert the alewife dataset for further processing
#> tcf2ln> # the data frame passed into this function must have had
#> tcf2ln> # character collections and repunits converted to factors
#> tcf2ln> reference <- alewife
#>
#> tcf2ln> reference$repunit <- factor(reference$repunit, levels = unique(reference$repunit))
#>
#> tcf2ln> reference$collection <- factor(reference$collection, levels = unique(reference$collection))
#>
#> tcf2ln> ale_long <- tcf2long(reference, 17)
#>
#> rfrn__> ale_rac <- reference_allele_counts(ale_long$long)
#>
#> a_frq_> ale_ac <- a_freq_list(ale_rac)
#>
#> alllc_> ale_cs <- ale_long$clean_short
#>
#> alllc_> # Get the vectors of gene copies a and b for all loci in integer index form
#> alllc_> ale_alle_list <- allelic_list(ale_cs, ale_ac)$int
PO <- as.integer(factor(ale_long$clean_short$collection))
coll_N <- as.vector(table(PO))
Colls_by_RU <- dplyr::count(ale_long$clean_short, repunit, collection) %>%
dplyr::filter(n > 0) %>%
dplyr::select(-n)
PC <- rep(0, length(unique((Colls_by_RU$repunit))))
for(i in 1:nrow(Colls_by_RU)) {
PC[Colls_by_RU$repunit[i]] <- PC[Colls_by_RU$repunit[i]] + 1
}
RU_starts <- c(0, cumsum(PC))
RU_vec <- as.integer(Colls_by_RU$collection)
param_list <- list_diploid_params(ale_ac, ale_alle_list, PO, coll_N, RU_vec, RU_starts)