使用 dplyr 在 r 中对一组进行多次采样

Question

我正在尝试在每个组中挑选样本：

df <- data.frame(ID=c(1,1,1,2,2,2), score=c(10,20,30,40,50,60))

  ID score
1  1    10
2  1    20
3  1    30
4  2    40
5  2    50
6  2    60

df %>% group_by(ID) %>% sample_n(2)

     ID score
1     1    20
2     1    30
3     2    50
4     2    40

但我想为每个 ID 多次 n，例如 2 次以获得如下内容：

     ID score sample_num
1     1    20          1
2     1    30          1
3     1    20          2
4     1    10          2
5     2    50          1
6     2    40          1
7     2    60          2
8     2    40          2

每个样本集都应该做到不放回。有没有办法在 dplyr 中做到这一点？我能想到的很长的方法是做一个for循环，每次迭代创建一个df，然后在最后将所有df组合在一起。

Answer 1

library(tidyverse)
set.seed(1)
n_repeat <- 2
n_sample <- 2

df <- data.frame(ID=c(1,1,1,2,2,2), score=c(10,20,30,40,50,60))

df %>% 
  group_nest(ID) %>% 
  transmute(ID,
            Score = map(data, ~as.vector(replicate(n_repeat, sample(.x$score, 2))))) %>% 
  unnest(Score) %>%
  group_by(ID) %>% 
  mutate(sample_no = rep(seq(n_repeat), each = n_sample)) %>% 
  ungroup()
#> # A tibble: 8 x 3
#>      ID Score sample_no
#>   <dbl> <dbl>     <int>
#> 1     1    10         1
#> 2     1    20         1
#> 3     1    30         2
#> 4     1    10         2
#> 5     2    50         1
#> 6     2    40         1
#> 7     2    60         2
#> 8     2    40         2

^{由 reprex package (v2.0.0)}

于 2021 年 6 月 11 日创建

Answer 2

如果你必须这样做 N 次，就这样做

为时间

N

map_dfr 将遍历其第一个参数，即 seq_len(N) ，执行您手动执行的操作，再改变一个变量，该变量将存储 seq_len(N) 的相应值，即 .x 在 lambda 公式中，对于每次迭代。
最终结果将在数据框中编译，因为我们使用的是 map

map_dfr

df <- data.frame(ID=c(1,1,1,2,2,2), score=c(10,20,30,40,50,60))

library(tidyverse)
N <- 7
map_dfr(seq_len(N), ~df %>% group_by(ID) %>% sample_n(2) %>% 
             mutate(sample_no = .x))
#> # A tibble: 28 x 3
#> # Groups:   ID [2]
#>       ID score sample_no
#>    <dbl> <dbl>     <int>
#>  1     1    20         1
#>  2     1    10         1
#>  3     2    60         1
#>  4     2    50         1
#>  5     1    30         2
#>  6     1    10         2
#>  7     2    60         2
#>  8     2    40         2
#>  9     1    10         3
#> 10     1    20         3
#> # ... with 18 more rows

^{由 reprex package (v2.0.0)}

于 2021 年 6 月 11 日创建

Answer 3

library(tidyverse)

df <- data.frame(ID=c(1,1,1,2,2,2), score=c(10,20,30,40,50,60))

set.seed(123)

#option 1
rerun(2, df %>% group_by(ID) %>% sample_n(2,replace = FALSE)) %>%
    map2(1:length(.), ~mutate(.x, sample_n = .y)) %>% 
    reduce(bind_rows) %>% 
    arrange(ID)
#> # A tibble: 8 x 3
#> # Groups:   ID [2]
#>      ID score sample_n
#>   <dbl> <dbl>    <int>
#> 1     1    30        1
#> 2     1    10        1
#> 3     1    30        2
#> 4     1    20        2
#> 5     2    60        1
#> 6     2    50        1
#> 7     2    50        2
#> 8     2    60        2

#option 2 
map(1:2, ~df %>% group_by(ID) %>%
        sample_n(2,replace = FALSE) %>%
        mutate(sample_num = .x)) %>% 
    reduce(bind_rows) %>% 
    arrange(ID)
#> # A tibble: 8 x 3
#> # Groups:   ID [2]
#>      ID score sample_num
#>   <dbl> <dbl>      <int>
#> 1     1    30          1
#> 2     1    10          1
#> 3     1    10          2
#> 4     1    20          2
#> 5     2    50          1
#> 6     2    60          1
#> 7     2    60          2
#> 8     2    50          2

^{由 reprex package (v2.0.0)}

于 2021 年 6 月 11 日创建

使用 dplyr 在 r 中对一组进行多次采样

Sample within a group multiple times in r using dplyr

iteration

r

sample

dplyr