R - dplyr bootstrap 问题
R - dplyr bootstrap issue
我在理解如何正确使用 dplyr
bootstrap
功能时遇到问题。
我想要的是从两个 随机 分配的组生成一个 bootstrap 分布并计算均值差异,例如:
library(dplyr)
library(broom)
data(mtcars)
mtcars %>%
mutate(treat = sample(c(0, 1), 32, replace = T)) %>%
group_by(treat) %>%
summarise(m = mean(disp)) %>%
summarise(m = m[treat == 1] - m[treat == 0])
问题是我需要重复此操作 100
、1000
或更多次。
使用replicate
,我可以做到
frep = function(mtcars) mtcars %>%
mutate(treat = sample(c(0, 1), 32, replace = T)) %>%
group_by(treat) %>%
summarise(m = mean(disp)) %>%
summarise(m = m[treat == 1] - m[treat == 0])
replicate(1000, frep(mtcars = mtcars), simplify = T) %>% unlist()
并获取分布
我真的不知道如何在这里使用 bootstrap
。我应该如何开始?
mtcars %>%
bootstrap(10) %>%
mutate(treat = sample(c(0, 1), 32, replace = T))
mtcars %>%
bootstrap(10) %>%
do(tidy(treat = sample(c(0, 1), 32, replace = T)))
这不是真的有效。我应该把 bootstrap
pip 放在哪里?
谢谢。
在do
步骤中,我们用data.frame
包裹并创建'treat'列,然后我们可以按'replicate'和'treat'分组得到summarise
d 输出列
mtcars %>%
bootstrap(10) %>%
do(data.frame(., treat = sample(c(0,1), 32, replace=TRUE))) %>%
group_by(replicate, treat) %>%
summarise(m = mean(disp)) %>%
summarise(m = m[treat == 1] - m[treat == 0])
#or as 1 occurs second and 0 second, we can also use
#summarise(m = last(m) - first(m))
我在理解如何正确使用 dplyr
bootstrap
功能时遇到问题。
我想要的是从两个 随机 分配的组生成一个 bootstrap 分布并计算均值差异,例如:
library(dplyr)
library(broom)
data(mtcars)
mtcars %>%
mutate(treat = sample(c(0, 1), 32, replace = T)) %>%
group_by(treat) %>%
summarise(m = mean(disp)) %>%
summarise(m = m[treat == 1] - m[treat == 0])
问题是我需要重复此操作 100
、1000
或更多次。
使用replicate
,我可以做到
frep = function(mtcars) mtcars %>%
mutate(treat = sample(c(0, 1), 32, replace = T)) %>%
group_by(treat) %>%
summarise(m = mean(disp)) %>%
summarise(m = m[treat == 1] - m[treat == 0])
replicate(1000, frep(mtcars = mtcars), simplify = T) %>% unlist()
并获取分布
我真的不知道如何在这里使用 bootstrap
。我应该如何开始?
mtcars %>%
bootstrap(10) %>%
mutate(treat = sample(c(0, 1), 32, replace = T))
mtcars %>%
bootstrap(10) %>%
do(tidy(treat = sample(c(0, 1), 32, replace = T)))
这不是真的有效。我应该把 bootstrap
pip 放在哪里?
谢谢。
在do
步骤中,我们用data.frame
包裹并创建'treat'列,然后我们可以按'replicate'和'treat'分组得到summarise
d 输出列
mtcars %>%
bootstrap(10) %>%
do(data.frame(., treat = sample(c(0,1), 32, replace=TRUE))) %>%
group_by(replicate, treat) %>%
summarise(m = mean(disp)) %>%
summarise(m = m[treat == 1] - m[treat == 0])
#or as 1 occurs second and 0 second, we can also use
#summarise(m = last(m) - first(m))