This will read .ped and .map files (which can be gzipped, but cannot be the binary .bed or .bim plink format). The population specifier of each individual is assumed to be the first column (the FID column) in the .ped file.

plink2gscramble(ped = NULL, map = NULL, prefix = NULL, gz_ext = FALSE)

Arguments

ped

path to the plink .ped file holding information about the individuals and their genotypes. This file can also be gzipped. The function assumes that the second column of this file is unique across all family IDs. If this is not the case, the function throws a warning. It is assumed that missing genotypes are denoted by 0's in this file.

map

path to the plink .map file holding information about the markers. This file can be gzipped

prefix

If map and ped are not given as explicit paths to the file, you can give the prefix, and it will search for the two files with the .ped and .map extensions on the end of the prefix.

gz_ext

Logical. If TRUE, and specifying files by prefix, this will add a .gz extension to the map and ped files.

Value

A list with three components:

  • I_meta: meta data about the individuals in the file. This will include the columns of group (value of the first column of the ped file) and indiv (the ID of the individual stored in second column of the ped file). And wil also include the other four columns of the plink ped specification, named as follows: pa ma, sex_code, pheno.

  • M_meta: meta data about the markers. A tibble with the columns chrom, pos, and variant_id and link_pos. The link_pos column holds the information about marker position in Morgans or cM that was included in the map file.

  • Geno: a character matrix of genotypes with number-of-indviduals rows and number-of-markers * 2 columns. Missing genotypes in this matrix are coded as NA.

Examples

ped_plink <- system.file("extdata/example-plink.ped.gz", package = "gscramble")
map_plink <- system.file("extdata/example-plink.map.gz", package = "gscramble")

result <- plink2gscramble(ped_plink, map_plink)