如何使用 R 中的 "bootstrap function" 计算置信区间

Question

我正在尝试计算 R 中的置信区间。由于某些特殊原因，我必须使用 "bootstrap" 包中的函数来完成。（这意味着我不能使用 "boot"包。）

这是我的代码。

而我正在做的是尝试计算 Pearson 相关系数，然后应用 Bootstrap 方法（B = 100）来获得相关系数的估计值。但我不知道如何构建 95% 的置信区间。

library(bootstrap) 
data('law')

set.seed(1)
theta <- function(ind) {
  cor(law[ind, 1], law[ind, 2], method = "pearson")
  }
law.boot <- bootstrap(1:15, 100, theta) 
# sd(law$thetastar)
percent.95 <- function(x) {
  quantile(x,  .95)
  }
law.percent.95 <- bootstrap(1:15, 100, theta, func=percent.95)

抱歉，如果我没有说清楚或标记错误的标签。很抱歉没有生成数据集（现在提供了）并感谢 Roland 教授指出了这一点。非常感谢！

Answer 1

通常，在自举之后，我们使用 2.5% 和 97.5% 的百分位数作为 95% 的置信区间（因为我们从每边减去 α/2=.025）。另见 @thothal 的 和答案下的评论。

R <- 1e5 - 1  ## number of bootstrap replications
est <- with(law, cor(lsat, gpa))  ## naïve correlation

theta <- function(ind) cor(law[ind, 1], law[ind, 2], method="pearson")
set.seed(1)
B1 <- bootstrap::bootstrap(seq(nrow(law)), R, theta) 
(ci1 <- c(estimate=est, quantile(B1$thetastar, c(.025, .975))))
#  estimate      2.5%     97.5% 
# 0.7763745 0.4594845 0.9620884

这里是另一种从零开始的方法：

theta2 <- function(x) with(x, cor(lsat, gpa))
set.seed(1)
B2 <- replicate(R, theta2(law[sample(nrow(law), nrow(law), replace=TRUE), ]))
(ci2 <- c(estimate=est, quantile(B2, c(.025, .975))))
#  estimate      2.5%     97.5% 
# 0.7763745 0.4607644 0.9617970

最后是使用具有 boot.ci 功能的 boot 包的方法：

theta3 <- function(data, k) cor(data[k, ])[1,2]
set.seed(1)
B3 <- boot::boot(law, theta3, R=R)
(ci3 <- c(est, boot::boot.ci(B3, type='perc')$percent[4:5]))
# [1] 0.7763745 0.4593727 0.9620923

Answer 2

bootstrap 估计器有多种计算 CI 的方法（参见 to this Wikipedia article for instance。

最简单的方法是根据 bootstrapped 系数计算 2.5% 和 97.5% 分位数（维基百科文章中的百分位数 Bootstrap）：

quantile(law.boot$thetastar, c(0.025, 0.975))
#      2.5%     97.5% 
# 0.4528745 0.9454483

基本 Bootstrap 将计算为

2 * mean(law.boot$thetastar) - quantile(law.boot$thetastar, c(0.975, 0.025))
#     97.5%      2.5% 
# 0.5567887 1.0493625

如何使用 R 中的 "bootstrap function" 计算置信区间

How to calculate confidence interval using the "bootstrap function" in R

r

confidence-interval

statistics-bootstrap