来自具有特定计数的数据帧的随机样本

Question

这个问题最好用一个例子来说明。

假设我有一个数据帧 df 和一个二进制变量 b （b 的值为 0 或 1）。我怎样才能从这个数据框中抽取一个大小为 10 的随机样本，以便我有 2 个随机样本中 b=0 的实例，以及 8 个数据框中 b=1 的实例？

现在，我知道我可以 df[sample(nrow(df),10,] 来获得部分答案，但这会给我随机数量的 0 和 1 实例。如何在仍然采用随机样本的同时指定特定数量的 0 和 1 实例？

Answer 1

这是我如何执行此操作的示例...取两个样本并将它们合并。我写了一个简单的函数，这样你就可以“只取一个样本”。

使用向量：

pop <- sample(c(0,1), 100, replace = TRUE)

yoursample <- function(pop, n_zero, n_one){
  c(sample(pop[pop == 0], n_zero),
    sample(pop[pop == 1], n_one))
}

yoursample(pop, n_zero = 2, n_one = 8)
[1] 0 0 1 1 1 1 1 1 1 1

或者，如果您使用的数据框具有一些名为 id 的唯一索引：

# Where d1 is your data you are summarizing with mean and sd
dat <- data.frame(
    id = 1:100,
    val = sample(c(0,1), 100, replace = TRUE),
    d1 = runif(100))

yoursample <- function(dat, n_zero, n_one){
  c(sample(dat[dat$val == 0,"id"], n_zero),
    sample(dat[dat$val == 1,"id"], n_one))
}

sample_ids <- yoursample(dat, n_zero = 2, n_one = 8)  
sample_ids

mean(dat[dat$id %in% sample_ids,"d1"])
sd(dat[dat$id %in% sample_ids,"d1"])

Answer 2

这里有一个建议：

首先创建一个包含 id 列的 0 和 1 样本。然后根据条件对 2:8 个 df 进行采样并将它们绑定在一起：

library(tidyverse)

set.seed(123)
df <- as_tibble(sample(0:1,size=50,replace=TRUE)) %>% 
  mutate(id = row_number())

df1 <- df[ sample(which (df$value ==0) ,2), ]
df2 <- df[ sample(which (df$value ==1), 8), ]    

df_final <- bind_rows(df1, df2)

   value    id
   <int> <int>
 1     0    14
 2     0    36
 3     1    21
 4     1    24
 5     1     2
 6     1    50
 7     1    49
 8     1    41
 9     1    28
10     1    33

Answer 3

library(tidyverse)

set.seed(123)

df <- data.frame(a = letters,
                 b = sample(c(0,1),26,T))

bind_rows(
  df %>% 
    filter(b == 0) %>% 
    sample_n(2),
  df %>% 
    filter(b == 1) %>% 
    sample_n(8)
) %>% 
  arrange(a)

   a b
1  d 1
2  g 1
3  h 1
4  l 1
5  m 1
6  o 1
7  p 0
8  q 1
9  s 0
10 v 1

来自具有特定计数的数据帧的随机样本

Random Sample From a Dataframe With Specific Count

random

r

dataframe