check for matching (or close to matching) genotypes in a data frame
Source:R/close_matching_samples.R
close_matching_samples.RdSuper simple function that looks at all pairs of fish from the data frame and returns a tibble that includes those which shared a fraction >= than min_frac_non_miss of the genotypes not missing in either fish, and which were matching at a fraction >= min_frac_matching of those non-missing pairs of genotypes.
Arguments
- D
a two-column format genetic dataset, with "repunit", "collection", and "indiv" columns, as well as a "sample_type" column that has entries either of "reference" or of both "reference" and "mixture."
- gen_start_col
the first column of genetic data in
reference- min_frac_non_miss
the fraction of loci that the pair must share non missing in order to be reported
- min_frac_matching
the fraction of shared non-missing loci that must be shared between the indivdiuals to be reported as a matching pair.
Examples
# one pair found in the interal alewife data set:
close_matching_samples(alewife, 17)
#> Summary Statistics:
#>
#> 1070 Individuals in Sample
#>
#> 11 Loci: Aa046, Aa070, Aa074, Aa081, Aa091, Aa093, Ap010, Ap033, Ap038, Ap058, Ap071
#>
#> 3 Reporting Units: NNE, SNE, MAT
#>
#> 21 Collections: EMA, STG, PIS, MYS, MON, TBR, GIL, THA, BRI, CON, QUI, HOU, PEQ, MIA, HUD, DEL, NAN, RAP, CHO, ROA, ALL
#>
#> 6.31% of allelic data identified as missing
#> # A tibble: 1 × 10
#> num_non_miss num_match indiv_1 indiv_2 collection_1 collection_2 sample_type_1
#> <int> <int> <chr> <chr> <chr> <chr> <chr>
#> 1 11 10 AEMME_… ASGME_… EMA STG reference
#> # ℹ 3 more variables: repunit_1 <chr>, sample_type_2 <chr>, repunit_2 <chr>