Convert 'gscramble' output to newhybrids format — gscramble2newhybrids • gscramble

This function turns character-based alleles into integers and writes the necessary headers, etc. It preferentially uses the "id" column if it exists in M$ret_ids. Otherwise it uses the indiv column for the sample names.

gscramble2newhybrids(
  M,
  M_meta,
  z = NULL,
  s = NULL,
  retain = NULL,
  outfile = tempfile()
)

Arguments

M: the output from segments2markers() from 'gscramble'. This could have an added id column on it, which will then be used for the sample names.
M_meta: the Marker meta data file.
z: A vector of length two. The values are regular expressions that the sample names that you want to have -z 0 or -z 1 should match. For example c("SH", "CCT") means any sample matching "SH" would get z0 and any sample matchine "CCT" would get z1.
s: a single regular expressions that matches individuals that should be given the -s option. For example "SH|CCT"
retain: a vector of loci to retain.
outfile: path to the file to write the newhybrids data set to. For CRAN compliance, this is, by default, a temp file. But you can change it to be anything valid.

Value

A list with three components:

outfile: outfile name of saved data.
genos: Genotypes
allele_names: Allele names.

Details

It allows you to set the -s and -z through some regular expression mapping.

This function relies a lot on some tidyverse functions for pivoting, etc. As such, it is not intended for data sets with tens of thousands of markers. You oughtn't be using NewHybrids with so many markers, anyway!

Examples

# get output from segments2markers():
example("segments2markers")
#> 
#> sgmnt2> #### First, get input segments for the function ####
#> sgmnt2> # We construct an example here where we will request segregation
#> sgmnt2> # down a GSP with two F1s and F1B backcrosses between two hypothetical
#> sgmnt2> # populations, A and B.
#> sgmnt2> set.seed(5)
#> 
#> sgmnt2> gsp_f1f1b <- create_GSP("A", "B", F1 = TRUE, F1B = TRUE)
#> 
#> sgmnt2> # We will imagine that in our marker data there are three groups
#> sgmnt2> # labelled "Pop1", "Pop2", and "Pop3", and we want to create the F1Bs with backcrossing
#> sgmnt2> # only to Pop3.
#> sgmnt2> reppop <- tibble::tibble(
#> sgmnt2+     index = as.integer(c(1, 1, 2, 2)),
#> sgmnt2+     pop = c("A", "B", "A", "B"),
#> sgmnt2+     group = c("Pop3", "Pop1", "Pop3", "Pop2")
#> sgmnt2+ )
#> 
#> sgmnt2> # combine those into a request
#> sgmnt2> request <- tibble::tibble(
#> sgmnt2+    gpp = list(gsp_f1f1b),
#> sgmnt2+    reppop = list(reppop)
#> sgmnt2+ )
#> 
#> sgmnt2> # now segegate segments.  Explicitly pass the markers
#> sgmnt2> # in M_meta so that the order of the markers is set efficiently.
#> sgmnt2> segs <- segregate(request, RecRates, M_meta)
#> 
#> sgmnt2> #### Now, use segs in an example with segments2markers() ####
#> sgmnt2> # this uses several package data objects that are there for examples
#> sgmnt2> # and illustration.
#> sgmnt2> s2m_result <- segments2markers(segs, I_meta, M_meta, Geno)
# copy that result to a new variable
M <- s2m_result

# then run it
gscramble2newhybrids(M, M_meta)
#> $outfile
#> [1] "/tmp/Rtmpc4n3J0/file1a18271e508"
#> 
#> $genos
#> # A tibble: 78 × 102
#>    useit opt_col   WU_10.2_12_4469057 ALGA0064411 WU_10.2_12_7394362 ASGA0090707
#>    <int> <chr>     <chr>              <chr>       <chr>              <chr>      
#>  1     1 n h-1-1-… 0202               0000        0101               0201       
#>  2     2 n h-1-1-… 0102               0101        0101               0202       
#>  3     3 n h-1-1-… 0102               0000        0102               0201       
#>  4     4 n h-1-2-… 0102               0201        0102               0202       
#>  5     5 n h-1-2-… 0102               0201        0102               0202       
#>  6     6 n h-1-2-… 0202               0201        0101               0201       
#>  7     7 n permed… 0202               0101        0102               0101       
#>  8     8 n permed… 0202               0101        0202               0101       
#>  9     9 n permed… 0202               0101        0202               0000       
#> 10    10 n permed… 0202               0101        0202               0101       
#> # ℹ 68 more rows
#> # ℹ 96 more variables: WU_10.2_12_8971475 <chr>, WU_10.2_12_11034660 <chr>,
#> #   WU_10.2_12_11146365 <chr>, ASGA0053210 <chr>, ALGA0114537 <chr>,
#> #   WU_10.2_12_15628863 <chr>, ALGA0108041 <chr>, ALGA0065869 <chr>,
#> #   WU_10.2_12_31279548 <chr>, DRGA0011741 <chr>, WU_10.2_12_42676693 <chr>,
#> #   M1GA0016777 <chr>, WU_10.2_12_46448883 <chr>, ALGA0066740 <chr>,
#> #   MARC0036299 <chr>, ALGA0118253 <chr>, WU_10.2_12_57360470 <chr>, …
#> 
#> $allele_names
#> # A tibble: 298 × 4
#>    locus       allele alle_int     n
#>    <chr>       <chr>  <chr>    <int>
#>  1 17_13447160 A      01          42
#>  2 17_13447160 G      02         108
#>  3 17_13447160 NA     00           6
#>  4 17_17657093 C      01         102
#>  5 17_17657093 T      02          38
#>  6 17_17657093 NA     00          16
#>  7 ALGA0049523 A      01          86
#>  8 ALGA0049523 G      02          64
#>  9 ALGA0049523 NA     00           6
#> 10 ALGA0064411 C      01         136
#> # ℹ 288 more rows
#>