R/gscramble2newhybrids.R
gscramble2newhybrids.Rd
This function turns character-based alleles into integers
and writes the necessary headers, etc. It preferentially uses
the "id" column if it exists in M$ret_ids
. Otherwise it uses
the indiv
column for the sample names.
gscramble2newhybrids(
M,
M_meta,
z = NULL,
s = NULL,
retain = NULL,
outfile = tempfile()
)
the output from segments2markers()
from 'gscramble'. This could
have an added id
column on it, which will then be used for the
sample names.
the Marker meta data file.
A vector of length two. The values
are regular expressions that the sample names that you want to have
-z 0 or -z 1 should match. For example c("SH", "CCT")
means any
sample matching "SH" would get z0 and any sample matchine "CCT" would
get z1.
a single regular expressions that matches individuals that should be given the -s option. For example "SH|CCT"
a vector of loci to retain.
path to the file to write the newhybrids data set to. For CRAN compliance, this is, by default, a temp file. But you can change it to be anything valid.
A list with three components:
outfile
: outfile name of saved data.
genos
: Genotypes
allele_names
: Allele names.
It allows you to set the -s and -z through some regular expression mapping.
This function relies a lot on some tidyverse functions for pivoting, etc. As such, it is not intended for data sets with tens of thousands of markers. You oughtn't be using NewHybrids with so many markers, anyway!
# get output from segments2markers():
example("segments2markers")
#>
#> sgmnt2> #### First, get input segments for the function ####
#> sgmnt2> # We construct an example here where we will request segregation
#> sgmnt2> # down a GSP with two F1s and F1B backcrosses between two hypothetical
#> sgmnt2> # populations, A and B.
#> sgmnt2> set.seed(5)
#>
#> sgmnt2> gsp_f1f1b <- create_GSP("A", "B", F1 = TRUE, F1B = TRUE)
#>
#> sgmnt2> # We will imagine that in our marker data there are three groups
#> sgmnt2> # labelled "Pop1", "Pop2", and "Pop3", and we want to create the F1Bs with backcrossing
#> sgmnt2> # only to Pop3.
#> sgmnt2> reppop <- tibble::tibble(
#> sgmnt2+ index = as.integer(c(1, 1, 2, 2)),
#> sgmnt2+ pop = c("A", "B", "A", "B"),
#> sgmnt2+ group = c("Pop3", "Pop1", "Pop3", "Pop2")
#> sgmnt2+ )
#>
#> sgmnt2> # combine those into a request
#> sgmnt2> request <- tibble::tibble(
#> sgmnt2+ gpp = list(gsp_f1f1b),
#> sgmnt2+ reppop = list(reppop)
#> sgmnt2+ )
#>
#> sgmnt2> # now segegate segments. Explicitly pass the markers
#> sgmnt2> # in M_meta so that the order of the markers is set efficiently.
#> sgmnt2> segs <- segregate(request, RecRates, M_meta)
#>
#> sgmnt2> #### Now, use segs in an example with segments2markers() ####
#> sgmnt2> # this uses several package data objects that are there for examples
#> sgmnt2> # and illustration.
#> sgmnt2> s2m_result <- segments2markers(segs, I_meta, M_meta, Geno)
# copy that result to a new variable
M <- s2m_result
# then run it
gscramble2newhybrids(M, M_meta)
#> $outfile
#> [1] "/tmp/Rtmp2pzuif/file190b21c0049"
#>
#> $genos
#> # A tibble: 78 × 102
#> useit opt_col WU_10.2_12_4469057 ALGA0064411 WU_10.2_12_7394362 ASGA0090707
#> <int> <chr> <chr> <chr> <chr> <chr>
#> 1 1 n h-1-1-… 0202 0000 0101 0201
#> 2 2 n h-1-1-… 0102 0101 0101 0202
#> 3 3 n h-1-1-… 0102 0000 0102 0201
#> 4 4 n h-1-2-… 0102 0201 0102 0202
#> 5 5 n h-1-2-… 0102 0201 0102 0202
#> 6 6 n h-1-2-… 0202 0201 0101 0201
#> 7 7 n permed… 0202 0101 0102 0101
#> 8 8 n permed… 0202 0101 0202 0101
#> 9 9 n permed… 0202 0101 0202 0000
#> 10 10 n permed… 0202 0101 0202 0101
#> # ℹ 68 more rows
#> # ℹ 96 more variables: WU_10.2_12_8971475 <chr>, WU_10.2_12_11034660 <chr>,
#> # WU_10.2_12_11146365 <chr>, ASGA0053210 <chr>, ALGA0114537 <chr>,
#> # WU_10.2_12_15628863 <chr>, ALGA0108041 <chr>, ALGA0065869 <chr>,
#> # WU_10.2_12_31279548 <chr>, DRGA0011741 <chr>, WU_10.2_12_42676693 <chr>,
#> # M1GA0016777 <chr>, WU_10.2_12_46448883 <chr>, ALGA0066740 <chr>,
#> # MARC0036299 <chr>, ALGA0118253 <chr>, WU_10.2_12_57360470 <chr>, …
#>
#> $allele_names
#> # A tibble: 298 × 4
#> locus allele alle_int n
#> <chr> <chr> <chr> <int>
#> 1 17_13447160 A 01 42
#> 2 17_13447160 G 02 108
#> 3 17_13447160 NA 00 6
#> 4 17_17657093 C 01 102
#> 5 17_17657093 T 02 38
#> 6 17_17657093 NA 00 16
#> 7 ALGA0049523 A 01 86
#> 8 ALGA0049523 G 02 64
#> 9 ALGA0049523 NA 00 6
#> 10 ALGA0064411 C 01 136
#> # ℹ 288 more rows
#>