This is used for identifying duplicate individuals/genotypes in large data sets. I've specified this in terms of the max number of missing loci because I think everyone should already have tossed out individuals with a lot of missing data, and then it makes it easy to toss out pairs without even looking at all the loci, so it is faster for all the comparisons.

find_close_matching_genotypes(LG, CK, max_mismatch)

Arguments

LG

a long genotypes data frame.

CK

a ckmr object created from the allele frequencies computed from LG.

max_mismatch

maximum allowable number of mismatching genotypes betwen the pairs.

Value

a data frame with columns:

indiv_1

the id (from the rownames in S) of the firt member of the pair

indiv_2

the id (from the rownames in S) of the second individual of the pair

num_mismatch

the number of loci at which the pair have mismatching genotypes

num_loc

the total number of loci missing in neither individual