如果使用相同的种子，为什么引导方法的结果会不同？

Question

我想从线性模型的 R2 生成 95% 的置信区间。在开发代码并为这两种方法使用相同的种子时，我发现手动执行 bootstrap 不会给我与使用 boot 包中的 boot 函数相同的结果。我现在想知道我是否做错了什么？或者为什么会这样？

另一方面，为了计算 95% CI，我尝试使用 confint 函数，但出现错误“$ operator is invalid for atomic vectors”。有什么解决办法可以避免这个错误吗？

这是一个可重现的例子来解释我的担忧

#creating the dataframe
a <- rpois(n = 100, lambda = 10)
b <- rnorm(n = 100, mean = 5, sd = 1)
DF<- data.frame(a,b)

#bootstrapping manually
set.seed(123)
x=length(DF$a) 
B_manually<- data.frame(replicate(100, summary(lm(a~b, data = DF[sample(x, replace = T),]))$r.squared))
names(B_manually)[1]<- "r_squared"

#Bootstrapping using the function "Boot" from Boot library
set.seed(123)
library(boot)
B_boot <- boot(DF, function(data,indices)
  summary(lm(a~b, data[indices,]))$r.squared,R=100)

head(B_manually) == head(B_boot$t)
r_squared
1     FALSE
2     FALSE
3     FALSE
4     FALSE
5     FALSE
6     FALSE
#Why does the results of the manually vs boot function approach differs if I'm using the same seed?

# 2nd question (Using the confint function to determine the 95 CI gives me an error)
confint(B_manually$r_squared, level = 0.95, method = "quantile")
confint(B_boot$t, level = 0.95, method = "quantile")
#Error: $ operator is invalid for atomic vectors

#NOTE: I already used the boot.ci to determine the 95 confidence interval, as well as the 
#quantile function to determine the CI, but the results of these CI differs from each others
#and just wanted to compare with the confint function.
quantile(B_function$t, c(0.025,0.975))
boot.ci(B_function, index=1,type="perc")

在此先感谢您的帮助！

Answer 1

boot 包不使用 replicate 和 sample 来生成索引。勾选source code for boot下的importance.array函数。它基本上一次性生成所有索引。所以没有理由假设您最终会得到相同的索引或相同的结果。退一步说，bootstrap 的目的是使用随机抽样方法来获得参数的估计值，您应该从 bootstrap.

的不同实现中得到相似的估计值

例如，你可以看到 R^2 的分布非常相似：

set.seed(111)
a <- rpois(n = 100, lambda = 10)
b <- rnorm(n = 100, mean = 5, sd = 1)
DF<- data.frame(a,b)

set.seed(123)
x=length(DF$a) 
B_manually<- data.frame(replicate(999, summary(lm(a~b, data = DF[sample(x, replace = T),]))$r.squared))

library(boot)
B_boot <- boot(DF, function(data,indices)
  summary(lm(a~b, data[indices,]))$r.squared,R=999)

par(mfrow=c(2,1))
hist(B_manually[,1],breaks=seq(0,0.4,0.01),main="dist of R2 manual")
hist(B_boot$t,breaks=seq(0,0.4,0.01),main="dist of R2 boot")

您正在使用的函数 confint 用于 lm 对象，用于估计系数的置信区间，请参阅 help page. It takes the standard error of the coefficient and multiply it by the critical t-value to give you confidence interval. You can check out this book page for the formula。来自 bootstrapping 的对象不是 lm 对象，此函数不起作用。它不适用于任何其他估计。

如果使用相同的种子，为什么引导方法的结果会不同？

Why does the results of the bootstrapping methods differs if it is being used the same seed?

r

confidence-interval

random-seed

statistics-bootstrap