downsample_pairs.Rd
This discards individuals from the sample, randomly, until the desired number of samples is achieved, then it returns only those pairs in which both members are part of the retained samples.
downsample_pairs(S, P, n)
the tibble of samples with columns at least of ID
and samp_years_list
. Typically
this will be what is returned in the samples
component from slurp_spip()
.
the tibble of pairs. Typically this will be what has been returned from
compile_related_pairs()
.
The desired number of individuals (or instances, really, see below) to retain in the sample.
This returns a list with two components as follows:
ds_samples
: A tibble like S
except having randomly removed individuals
so as to only have n left.
ds_pairs
: A tibble like P
except having removed any pairs that
include individuals that were not retained in the sample.
# prepare some input
S <- three_pops_with_mig_slurped_results$samples
P <- compile_related_pairs(three_pops_with_mig_slurped_results$samples)
result <- downsample_pairs(S, P, n = 500)
#> Warning: Returning more (or less) than 1 row per `summarise()` group was deprecated in
#> dplyr 1.1.0.
#> ℹ Please use `reframe()` instead.
#> ℹ When switching from `summarise()` to `reframe()`, remember that `reframe()`
#> always returns an ungrouped data frame and adjust accordingly.
#> ℹ The deprecated feature was likely used in the CKMRpop package.
#> Please report the issue to the authors.
# print the result
result
#> $ds_samples
#> # A tibble: 500 × 13
#> ID samp_years_list sex born_year born_pop pop_pre pop_post pop_dur
#> <chr> <list> <chr> <int> <int> <chr> <chr> <chr>
#> 1 F10_0_19 <int [1]> F 10 0 0 NA NA
#> 2 F10_0_33 <int [1]> F 10 0 0 NA NA
#> 3 F10_0_37 <int [1]> F 10 0 0 NA NA
#> 4 F10_0_47 <int [1]> F 10 0 0 NA NA
#> 5 F10_0_50 <int [1]> F 10 0 0 NA NA
#> 6 F10_0_60 <int [1]> F 10 0 0 NA NA
#> 7 F10_1_104 <int [1]> F 10 1 1 NA NA
#> 8 F10_1_108 <int [1]> F 10 1 1 NA NA
#> 9 F10_1_34 <int [1]> F 10 1 NA 1 NA
#> 10 F10_1_5 <int [1]> F 10 1 NA 1 NA
#> # ℹ 490 more rows
#> # ℹ 5 more variables: samp_years_list_pre <list>, samp_years_list_dur <list>,
#> # samp_years_list_post <list>, ancestors <list>, relatives <list>
#>
#> $ds_pairs
#> # A tibble: 162 × 33
#> id_1 id_2 conn_comp dom_relat max_hit dr_hits upper_member
#> <chr> <chr> <dbl> <chr> <int> <list> <int>
#> 1 F10_0_33 M10_0_49 1 Si 1 <int [2]> NA
#> 2 F10_0_60 M9_0_18 2 Si 1 <int [2]> NA
#> 3 F10_1_104 F15_1_91 3 PO 1 <int [2]> 1
#> 4 F10_1_34 F10_1_58 4 Si 1 <int [2]> NA
#> 5 F10_1_34 M10_1_24 4 Si 1 <int [2]> NA
#> 6 F10_1_34 M10_1_27 4 Si 1 <int [2]> NA
#> 7 F10_1_5 M10_1_103 5 Si 1 <int [2]> NA
#> 8 F10_1_5 M10_1_132 5 Si 1 <int [2]> NA
#> 9 F10_1_60 M10_1_63 6 Si 1 <int [2]> NA
#> 10 F10_1_60 M10_1_75 6 Si 1 <int [2]> NA
#> # ℹ 152 more rows
#> # ℹ 26 more variables: times_encountered <int>,
#> # primary_shared_ancestors <list>, psa_tibs <list>, pop_pre_1 <chr>,
#> # pop_post_1 <chr>, pop_dur_1 <chr>, pop_pre_2 <chr>, pop_post_2 <chr>,
#> # pop_dur_2 <chr>, sex_1 <chr>, sex_2 <chr>, born_year_1 <int>,
#> # born_year_2 <int>, samp_years_list_pre_1 <list>, samp_years_list_1 <list>,
#> # samp_years_list_dur_1 <list>, samp_years_list_post_1 <list>, …
#>