r中做boot package，用第一个return(result)作为观测数据计算置信区间

Question

我在 R 中使用函数 boot 做一个 bootstrap，但我没有将我的数据集直接作为数据参数传递到 boot 函数中，而是传递了一个在统计中使用的索引以进行合并两个数据表来得到我的结果。好像boot是用第一个bootstrap的结果作为真正的采样数据（说经验值）。这个对吗？因为当我手动执行 bootstrap 时，我得到了类似的结果。虽然我希望引导使用 'data' 作为原始数据。我很迷惑。 CI 是有道理的，但我希望它不会起作用，除非出于我提到的原因。

简而言之，我有一个索引向量

x=1:100

和我的函数

myboot <- function(data,indeces) {
  toselect <- data[indeces] # allows boot to select sample
  toselect=as.data.table(toselect)
  #this is where I use the index for the merge
  t=merge(toselect,mydataset,allow.cartesian=TRUE)
  return(nrow(t))
}
b <- boot(data=x, statistic=myboot, R=1000)

我得到的结果

ORDINARY NONPARAMETRIC BOOTSTRAP

Call:
boot(data = x, statistic = myboot, R = 1000)

Bootstrap Statistics :
    original      bias    std. error
t1* 397.2477 -0.03669725    11.70803
> boot.ci(b, type="bca")
BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
Based on 1000 bootstrap replicates

CALL : 
boot.ci(boot.out = b, type = "bca")

Intervals : 
Level       BCa          
95%   (375.2, 421.1 )

Answer 1

是的，你是对的。

用于计算统计信息的函数具有以下要求（根据帮助页面）：

... In all other cases statistic must take at least two arguments. The first argument passed will always be the original data. The second will be a vector of indices, frequencies or weights which define the bootstrap sample. Further, if predictions are required, then a third argument is required which would be a vector of the random indices used to generate the bootstrap predictions.

由于您的数据集包含 1:100 中的数字，因此传递的第二个参数将从 1:100 中采样，并最终产生完全相同的结果。换句话说，您的 data[indeces] 行将与 indeces 相同。

r中做boot package，用第一个return(result)作为观测数据计算置信区间

does boot package in r, use the first return(result) as the observed data to calculate confidence intervals

r

statistics-bootstrap