Return every pair of individuals that mismatch at no more than max_miss loci

This is used for identifying duplicate individuals/genotypes in large data sets. I've specified this in terms of the max number of missing loci because I think everyone should already have tossed out individuals with a lot of missing data, and then it makes it easy to toss out pairs without even looking at all the loci, so it is faster for all the comparisons.

find_close_matching_genotypes(LG, CK, max_mismatch)

Arguments

LG: a long genotypes data frame.
CK: a ckmr object created from the allele frequencies computed from LG.
max_mismatch: maximum allowable number of mismatching genotypes betwen the pairs.

Value

a data frame with columns:

indiv_1: the id (from the rownames in S) of the firt member of the pair
indiv_2: the id (from the rownames in S) of the second individual of the pair
num_mismatch: the number of loci at which the pair have mismatching genotypes
num_loc: the total number of loci missing in neither individual