根据群体内的人口权重抽样
Sample according to population weights within groups
我有一个 data.frame
,我需要从中提取样本。对于每年我想要根据人口权重进行 50 次观察。这是一些示例代码:
library(dplyr)
set.seed(1234)
ex.df <- data.frame(value=runif(1000),
year = rep(1991:2010, each=50),
group= sample(c("A", "B", "C"), 1000, replace=T)) %>%
mutate(pop.weight = ifelse(group=="A", 0.5,
ifelse(group=="B", 0.3,
ifelse(group=="C", 0.2, group))))
set.seed(1234)
test <- ex.df %>%
group_by(year) %>%
sample_n(50, weight=pop.weight) %>%
ungroup()
table(test$group)/sum(table(test$group))
A B C
0.329 0.319 0.352
A
组应占 50% 左右,B
组应占 30%,C
应占 20% 左右。我错过了什么?
设置replace = TRUE
。您希望每年进行 50 次观察,但 ex.df
每年仅包含 50 次观察,如果 replace = FALSE
它只会 return 具有不同顺序的相同行。
set.seed(1234)
test <- ex.df %>%
group_by(year) %>%
sample_n(50, weight=pop.weight, replace = TRUE) %>%
ungroup()
table(test$group)/sum(table(test$group))
# A B C
# 0.509 0.299 0.192
或者您可以在 ex.df
中增加每年的观察次数。在下面的示例中,我将每年的观察值更改为 5000,结果 test
的比率看起来很合理。
set.seed(1234)
ex.df <- data.frame(value=runif(100000),
year = rep(1991:2010, each=5000),
group= sample(c("A", "B", "C"), 1000, replace=T)) %>%
mutate(pop.weight = ifelse(group=="A", 0.5,
ifelse(group=="B", 0.3,
ifelse(group=="C", 0.2, group))))
set.seed(1234)
test <- ex.df %>%
group_by(year) %>%
sample_n(50, weight=pop.weight) %>%
ungroup()
table(test$group)/sum(table(test$group))
# A B C
# 0.515 0.276 0.209
我有一个 data.frame
,我需要从中提取样本。对于每年我想要根据人口权重进行 50 次观察。这是一些示例代码:
library(dplyr)
set.seed(1234)
ex.df <- data.frame(value=runif(1000),
year = rep(1991:2010, each=50),
group= sample(c("A", "B", "C"), 1000, replace=T)) %>%
mutate(pop.weight = ifelse(group=="A", 0.5,
ifelse(group=="B", 0.3,
ifelse(group=="C", 0.2, group))))
set.seed(1234)
test <- ex.df %>%
group_by(year) %>%
sample_n(50, weight=pop.weight) %>%
ungroup()
table(test$group)/sum(table(test$group))
A B C
0.329 0.319 0.352
A
组应占 50% 左右,B
组应占 30%,C
应占 20% 左右。我错过了什么?
设置replace = TRUE
。您希望每年进行 50 次观察,但 ex.df
每年仅包含 50 次观察,如果 replace = FALSE
它只会 return 具有不同顺序的相同行。
set.seed(1234)
test <- ex.df %>%
group_by(year) %>%
sample_n(50, weight=pop.weight, replace = TRUE) %>%
ungroup()
table(test$group)/sum(table(test$group))
# A B C
# 0.509 0.299 0.192
或者您可以在 ex.df
中增加每年的观察次数。在下面的示例中,我将每年的观察值更改为 5000,结果 test
的比率看起来很合理。
set.seed(1234)
ex.df <- data.frame(value=runif(100000),
year = rep(1991:2010, each=5000),
group= sample(c("A", "B", "C"), 1000, replace=T)) %>%
mutate(pop.weight = ifelse(group=="A", 0.5,
ifelse(group=="B", 0.3,
ifelse(group=="C", 0.2, group))))
set.seed(1234)
test <- ex.df %>%
group_by(year) %>%
sample_n(50, weight=pop.weight) %>%
ungroup()
table(test$group)/sum(table(test$group))
# A B C
# 0.515 0.276 0.209