R中主成分载荷的置信区间

Question

我正在使用以下代码使用 R 中的 prcomp 函数对鸢尾花数据集的前 4 列进行主成分分析：

> prcomp(iris[1:4])
Standard deviations:
[1] 2.0562689 0.4926162 0.2796596 0.1543862

Rotation:
                     PC1         PC2         PC3        PC4
Sepal.Length  0.36138659 -0.65658877  0.58202985  0.3154872
Sepal.Width  -0.08452251 -0.73016143 -0.59791083 -0.3197231
Petal.Length  0.85667061  0.17337266 -0.07623608 -0.4798390
Petal.Width   0.35828920  0.07548102 -0.54583143  0.7536574

如何在 R 中获得这些值的置信区间？有什么包可以做到吗？感谢您的帮助。

Answer 1

您可以对此使用 bootstrapping。只需使用 bootstrapping 包对数据重新采样并记录每次计算的主成分。使用生成的经验分布来获取置信区间。

boot 包使这很容易。

这里是一个计算第一个 PCA 分量的置信区间为 95% 的例子 Sepal.Length:

library(boot)

getPrcStat <- function (samdf,vname,pcnum){
  prcs <- prcomp(samdf[1:4]) # returns matrix
  return(prcs$rotation[ vname,pcnum ])   # pick out the thing we need
}

bootEst <- function(df,d){
   sampledDf <- df[ d, ]  # resample dataframe 
   return(getPrcStat(sampledDf,"Sepal.Length",1))
}

bootOut <- boot(iris,bootEst,R=10000)
boot.ci(bootOut,type=c("basic"))

输出为：

  BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
  Based on 10000 bootstrap replicates

  CALL : 
  boot.ci(boot.out = bootOut, type = c("basic"))

  Intervals : 
  Level      Basic         
  95%   ( 0.3364,  1.1086 )  
  Calculations and Intervals on Original Scale

因此，使用通常的基本 bootstrap 方法，我们得到 95% 的置信区间在 0.3364 和 1.1086 之间。还有很多其他更高级的统计方法也可以使用，但你需要知道你在做什么。

R中主成分载荷的置信区间

Confidence intervals of loadings in principal components in R

r

confidence-interval

pca