Test the effects of the parametric bootstrap bias correction on a reference dataset through cross-validation
Source:R/assess_pb_bias_correction.R
assess_pb_bias_correction.RdThis is a rewrite of bias_comparison(). Eric didn't want the plotting to be wrapped up in a function, and wanted to return a more informative data frame.
Usage
assess_pb_bias_correction(
reference,
gen_start_col,
seed = 5,
nreps = 50,
mixsize = 100,
alle_freq_prior = list(const_scaled = 1)
)Arguments
- reference
a two-column format genetic dataset, with a "repunit" column specifying each individual's reporting unit of origin, a "collection" column specifying the collection (population or time of sampling) and "indiv" providing a unique name
- gen_start_col
the first column containing genetic data in
reference. All columns should be genetic format following this column, and gene copies from the same locus should be adjacent- seed
the random seed for simulations
- nreps
The number of reps to do.
- mixsize
The size of each simulated mixture sample.
- alle_freq_prior
a one-element named list specifying the prior to be used when generating Dirichlet parameters for genotype likelihood calculations. Valid methods include
"const","scaled_const", and"empirical". See?list_diploid_paramsfor method details.
Value
bias_comparison returns a list; the first element is
a list of the relevant rho values generated on each iteration of the random "mixture"
creation. This includes the true rho value, the standard result rho_mcmc,
and the parametric bootstrapped rho_pb.
The second element is a dataframe listing summary statistics for each
reporting unit and estimation method. mse, the mean squared error, summarizes
the deviation of the rho estimates from their true value, including both bias and other variance.
mean_prop_bias is the average ratio of residual to true value, which gives greater
weight to deviations at smaller values. mean_bias is simply the average residual;
unlike mse, this demonstrates the direction of the bias.
Details
Takes a reference two-column genetic dataset, pulls a series of random "mixture" datasets with varying reporting unit proportions from this reference, and compares the results of GSI through standard MCMC vs. parametric-bootstrap MCMC bias correction
The amount of bias in reporting unit proportion calculations increases with the rate of misassignment between reporting units (decreases with genetic differentiation), and increases as the number of collections within reporting units becomes more uneven.
Output from the standard Bayesian MCMC method demonstrates the level of bias to be expected for the input data set, and parametric bootstrapping is an empirical method for the removal of any existing bias.