R/plink2gscramble.R
plink2gscramble.Rd
This will read .ped and .map files (which can be gzipped, but cannot be the binary .bed or .bim plink format). The population specifier of each individual is assumed to be the first column (the FID column) in the .ped file.
plink2gscramble(ped = NULL, map = NULL, prefix = NULL, gz_ext = FALSE)
path to the plink .ped
file holding information about the
individuals and their genotypes. This file can also be gzipped.
The function assumes that the second column of this file is unique across
all family IDs. If this is not the case, the function throws a warning.
It is assumed that missing genotypes are denoted by 0's in this file.
path to the plink .map
file holding information about
the markers. This file can be gzipped
If map and ped are not given as explicit paths to the file, you can give the prefix, and it will search for the two files with the .ped and .map extensions on the end of the prefix.
Logical. If TRUE, and specifying files by prefix, this will add a .gz extension to the map and ped files.
A list with three components:
I_meta
: meta data about the individuals in the file. This will
include the columns of group
(value of the first column of the
ped file) and indiv
(the ID of the individual stored in
second column of the ped file). And wil also include the other four
columns of the plink ped specification, named as follows: pa
ma
, sex_code
, pheno
.
M_meta
: meta data about the markers. A tibble with the columns
chrom
, pos
, and variant_id
and link_pos
. The link_pos
column
holds the information about marker position in Morgans or cM that was
included in the map
file.
Geno
: a character matrix of genotypes with number-of-indviduals rows
and number-of-markers * 2 columns. Missing genotypes in this matrix
are coded as NA
.
ped_plink <- system.file("extdata/example-plink.ped.gz", package = "gscramble")
map_plink <- system.file("extdata/example-plink.map.gz", package = "gscramble")
result <- plink2gscramble(ped_plink, map_plink)