如果使用相同的种子,为什么引导方法的结果会不同?

Why does the results of the bootstrapping methods differs if it is being used the same seed?

我想从线性模型的 R2 生成 95% 的置信区间。在开发代码并为这两种方法使用相同的种子时,我发现手动执行 bootstrap 不会给我与使用 boot 包中的 boot 函数相同的结果。我现在想知道我是否做错了什么?或者为什么会这样?

另一方面,为了计算 95% CI,我尝试使用 confint 函数,但出现错误“$ operator is invalid for atomic vectors”。有什么解决办法可以避免这个错误吗?

这是一个可重现的例子来解释我的担忧

#creating the dataframe
a <- rpois(n = 100, lambda = 10)
b <- rnorm(n = 100, mean = 5, sd = 1)
DF<- data.frame(a,b)

#bootstrapping manually
set.seed(123)
x=length(DF$a) 
B_manually<- data.frame(replicate(100, summary(lm(a~b, data = DF[sample(x, replace = T),]))$r.squared))
names(B_manually)[1]<- "r_squared"

#Bootstrapping using the function "Boot" from Boot library
set.seed(123)
library(boot)
B_boot <- boot(DF, function(data,indices)
  summary(lm(a~b, data[indices,]))$r.squared,R=100)

head(B_manually) == head(B_boot$t)
r_squared
1     FALSE
2     FALSE
3     FALSE
4     FALSE
5     FALSE
6     FALSE
#Why does the results of the manually vs boot function approach differs if I'm using the same seed?

# 2nd question (Using the confint function to determine the 95 CI gives me an error)
confint(B_manually$r_squared, level = 0.95, method = "quantile")
confint(B_boot$t, level = 0.95, method = "quantile")
#Error: $ operator is invalid for atomic vectors

#NOTE: I already used the boot.ci to determine the 95 confidence interval, as well as the 
#quantile function to determine the CI, but the results of these CI differs from each others
#and just wanted to compare with the confint function.
quantile(B_function$t, c(0.025,0.975))
boot.ci(B_function, index=1,type="perc")

在此先感谢您的帮助!

boot 包不使用 replicatesample 来生成索引。勾选source code for boot下的importance.array函数。它基本上一次性生成所有索引。所以没有理由假设您最终会得到相同的索引或相同的结果。退一步说,bootstrap 的目的是使用随机抽样方法来获得参数的估计值,您应该从 bootstrap.

的不同实现中得到相似的估计值

例如,你可以看到 R^2 的分布非常相似:

set.seed(111)
a <- rpois(n = 100, lambda = 10)
b <- rnorm(n = 100, mean = 5, sd = 1)
DF<- data.frame(a,b)

set.seed(123)
x=length(DF$a) 
B_manually<- data.frame(replicate(999, summary(lm(a~b, data = DF[sample(x, replace = T),]))$r.squared))

library(boot)
B_boot <- boot(DF, function(data,indices)
  summary(lm(a~b, data[indices,]))$r.squared,R=999)

par(mfrow=c(2,1))
hist(B_manually[,1],breaks=seq(0,0.4,0.01),main="dist of R2 manual")
hist(B_boot$t,breaks=seq(0,0.4,0.01),main="dist of R2 boot")

您正在使用的函数 confint 用于 lm 对象,用于估计 系数 的置信区间,请参阅 help page. It takes the standard error of the coefficient and multiply it by the critical t-value to give you confidence interval. You can check out this book page for the formula。来自 bootstrapping 的对象不是 lm 对象,此函数不起作用。它不适用于任何其他估计。