check for matching (or close to matching) genotypes in a data frame

Super simple function that looks at all pairs of fish from the data frame and returns a tibble that includes those which shared a fraction >= than min_frac_non_miss of the genotypes not missing in either fish, and which were matching at a fraction >= min_frac_matching of those non-missing pairs of genotypes.

Usage

close_matching_samples(
  D,
  gen_start_col,
  min_frac_non_miss = 0.7,
  min_frac_matching = 0.9
)

Arguments

D: a two-column format genetic dataset, with "repunit", "collection", and "indiv" columns, as well as a "sample_type" column that has entries either of "reference" or of both "reference" and "mixture."
gen_start_col: the first column of genetic data in reference
min_frac_non_miss: the fraction of loci that the pair must share non missing in order to be reported
min_frac_matching: the fraction of shared non-missing loci that must be shared between the indivdiuals to be reported as a matching pair.

Value

a tibble ...

Examples

# one pair found in the interal alewife data set:
close_matching_samples(alewife, 17)
#> Summary Statistics:
#> 
#> 1070 Individuals in Sample
#> 
#> 11 Loci: Aa046, Aa070, Aa074, Aa081, Aa091, Aa093, Ap010, Ap033, Ap038, Ap058, Ap071
#> 
#> 3 Reporting Units: NNE, SNE, MAT
#> 
#> 21 Collections: EMA, STG, PIS, MYS, MON, TBR, GIL, THA, BRI, CON, QUI, HOU, PEQ, MIA, HUD, DEL, NAN, RAP, CHO, ROA, ALL
#> 
#> 6.31% of allelic data identified as missing
#> # A tibble: 1 × 10
#>   num_non_miss num_match indiv_1 indiv_2 collection_1 collection_2 sample_type_1
#>          <int>     <int> <chr>   <chr>   <chr>        <chr>        <chr>        
#> 1           11        10 AEMME_… ASGME_… EMA          STG          reference    
#> # ℹ 3 more variables: repunit_1 <chr>, sample_type_2 <chr>, repunit_2 <chr>