downsample the number of individuals sampled — downsample

This discards individuals from the sample, randomly, until the desired number of samples is achieved, then it returns only those pairs in which both members are part of the retained samples.

downsample_pairs(S, P, n)

Arguments

S: the tibble of samples with columns at least of ID and samp_years_list. Typically this will be what is returned in the samples component from slurp_spip().
P: the tibble of pairs. Typically this will be what has been returned from compile_related_pairs().
n: The desired number of individuals (or instances, really, see below) to retain in the sample.

Value

This returns a list with two components as follows:

ds_samples: A tibble like S except having randomly removed individuals so as to only have n left.
ds_pairs: A tibble like P except having removed any pairs that include individuals that were not retained in the sample.

Examples

# prepare some input
S <- three_pops_with_mig_slurped_results$samples
P <- compile_related_pairs(three_pops_with_mig_slurped_results$samples)
result <- downsample_pairs(S, P, n = 500)
#> Warning: Returning more (or less) than 1 row per `summarise()` group was deprecated in
#> dplyr 1.1.0.
#> ℹ Please use `reframe()` instead.
#> ℹ When switching from `summarise()` to `reframe()`, remember that `reframe()`
#>   always returns an ungrouped data frame and adjust accordingly.
#> ℹ The deprecated feature was likely used in the CKMRpop package.
#>   Please report the issue to the authors.

# print the result
result
#> $ds_samples
#> # A tibble: 500 × 13
#>    ID        samp_years_list sex   born_year born_pop pop_pre pop_post pop_dur
#>    <chr>     <list>          <chr>     <int>    <int> <chr>   <chr>    <chr>  
#>  1 F10_0_28  <int [1]>       F            10        0 NA      0        NA     
#>  2 F10_0_37  <int [1]>       F            10        0 0       NA       NA     
#>  3 F10_0_50  <int [1]>       F            10        0 0       NA       NA     
#>  4 F10_0_51  <int [1]>       F            10        0 NA      0        NA     
#>  5 F10_0_60  <int [1]>       F            10        0 0       NA       NA     
#>  6 F10_0_83  <int [1]>       F            10        0 0       NA       NA     
#>  7 F10_0_86  <int [1]>       F            10        0 0       NA       NA     
#>  8 F10_1_108 <int [1]>       F            10        1 1       NA       NA     
#>  9 F10_1_12  <int [1]>       F            10        1 1       NA       NA     
#> 10 F10_1_34  <int [1]>       F            10        1 NA      1        NA     
#> # ℹ 490 more rows
#> # ℹ 5 more variables: samp_years_list_pre <list>, samp_years_list_dur <list>,
#> #   samp_years_list_post <list>, ancestors <list>, relatives <list>
#> 
#> $ds_pairs
#> # A tibble: 183 × 33
#>    id_1      id_2     conn_comp dom_relat max_hit dr_hits   upper_member
#>    <chr>     <chr>        <dbl> <chr>       <int> <list>           <int>
#>  1 F10_0_28  F10_0_86         1 Si              1 <int [2]>           NA
#>  2 F10_0_28  M10_0_29         1 Si              1 <int [2]>           NA
#>  3 F10_0_50  F10_0_51         2 Si              1 <int [2]>           NA
#>  4 F10_0_51  M10_0_49         2 Si              1 <int [2]>           NA
#>  5 F10_0_83  M10_0_7          3 Si              1 <int [2]>           NA
#>  6 F10_1_108 F10_1_7          4 Si              1 <int [2]>           NA
#>  7 F10_1_12  M11_1_56         5 Si              1 <int [2]>           NA
#>  8 F10_1_12  M7_1_112         5 PO              1 <int [2]>            2
#>  9 F10_1_34  M10_1_24         6 Si              1 <int [2]>           NA
#> 10 F10_1_34  M10_1_27         6 Si              1 <int [2]>           NA
#> # ℹ 173 more rows
#> # ℹ 26 more variables: times_encountered <int>,
#> #   primary_shared_ancestors <list>, psa_tibs <list>, pop_pre_1 <chr>,
#> #   pop_post_1 <chr>, pop_dur_1 <chr>, pop_pre_2 <chr>, pop_post_2 <chr>,
#> #   pop_dur_2 <chr>, sex_1 <chr>, sex_2 <chr>, born_year_1 <int>,
#> #   born_year_2 <int>, samp_years_list_pre_1 <list>, samp_years_list_1 <list>,
#> #   samp_years_list_dur_1 <list>, samp_years_list_post_1 <list>, …
#>