从组中抽样，但 n 在 R 中每组不同

Question

我正在尝试随机抽样 n 次给定的分组变量，但 n 因组而异。例如：

library(dplyr)
iris <- iris %>% mutate(len_bin=cut(Sepal.Length,seq(0,8,by=1))

我有这些因素，它们是我的分组变量：

table(iris$len_bin)

(4,5] (5,6] (6,7] (7,8] 
   32    57    49    12

有没有办法只随机抽取这些组 n 次，n 是每个元素出现在这个向量中的次数：

x <- c("(4,5]","(5,6]","(5,6]","(5,6]","(6,7]")

结果应如下所示：

# Groups:   len_bin [4]
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species    len_bin
         <dbl>       <dbl>        <dbl>       <dbl> <fct>      <fct>  
1          5           2            3.5         1   versicolor (4,5]  
2          5.3         3.7          1.5         0.2 setosa     (5,6]  
2          5.3         3.7          1.5         0.2 setosa     (5,6]  
2          5.3         3.7          1.5         0.2 setosa     (5,6]  
3          6.5         3            5.8         2.2 virginica  (6,7]

我设法使用 for 循环并使用基于向量的 sample_n() 来做到这一点。我假设必须有更快的方法。例如，我可以在 sample_n() 中定义 n 吗？

Answer 1

在基础 R 中你可以这样做：

iris <- iris %>% mutate(len_bin = cut(Sepal.Length, seq(4, 8, by = 1))
x <- c("(4,5]","(5,6]","(5,6]","(5,6]","(6,7]")

l <- mapply(\(x, y) x[sample(nrow(x), y), ], 
            split(iris, iris$len_bin), 
            c(table(factor(x, levels = levels(iris$len_bin)))), 
            SIMPLIFY = F)

do.call(rbind.data.frame, l)

#         Sepal.Length Sepal.Width Petal.Length Petal.Width    Species len_bin
#(4,5]             5.0         3.2          1.2         0.2     setosa   (4,5]
#(5,6].17          5.4         3.9          1.3         0.4     setosa   (5,6]
#(5,6].63          6.0         2.2          4.0         1.0 versicolor   (5,6]
#(5,6].97          5.7         2.9          4.2         1.3 versicolor   (5,6]
#(6,7]             6.9         3.1          5.1         2.3  virginica   (6,7]

从组中抽样，但 n 在 R 中每组不同

Sample from groups, but n varies per group in R

random

r

sample

dplyr