具有 R 中新因素的 PCA

PCA with new Factors in R

我的 objective 拟合具有相同响应的线性模型,但预测变量被 factors/scores 替换。如果我想从我的原始模型中获得至少 0.9*r.squared 的 R^2,我试图找出要包含在这种线性模型中的主要成分。 我应该选择哪些预测变量?

  model1 <- lm(Resp~.,data=test_dat)
  > summary(model1)

  Call:
  lm(formula = Resp ~ ., data = test_dat)

Residuals:
 Min       1Q   Median       3Q      Max 
 -0.35934 -0.07729  0.00330  0.08204  0.38709 

 Coefficients:
        Estimate Std. Error t value Pr(>|t|)    
(Intercept) -3.18858    0.06926 -46.039   <2e-16 ***
Pred1        4.32083    0.03767 114.708   <2e-16 ***
Pred2        2.42110    0.04740  51.077   <2e-16 ***
Pred3       -1.00507    0.04435 -22.664   <2e-16 ***
Pred4       -3.19480    0.09147 -34.927   <2e-16 ***
Pred5        2.77779    0.05368  51.748   <2e-16 ***
Pred6        1.22923    0.05427  22.648   <2e-16 ***
 Pred7       -1.21338    0.04562 -26.595   <2e-16 ***
Pred8        0.02485    0.05937   0.419    0.676    
Pred9       -0.67831    0.05308 -12.778   <2e-16 ***
Pred10       1.69947    0.02628  64.672   <2e-16 ***
 ---
 Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

 Residual standard error: 0.1193 on 489 degrees of freedom
  Multiple R-squared:  0.997,   Adjusted R-squared:  0.997 
  F-statistic: 1.645e+04 on 10 and 489 DF,  p-value: < 2.2e-16

我的新模型应该有一个 R^2 >= 0.897

    (threshold<-0.9*r.sqrd)
    [1] 0.8973323

 metrics.swiss <- calc.relimp(model1, type = c("lmg", "first", "last","betasq", "Pratt"))
 metrics.swiss
  metrics.swiss@lmg.rank
  >Pred1  Pred2  Pred3  Pred4  Pred5  Pred6  Pred7  Pred8  Pred9 Pred10 
   2      8      3      6      1     10      5      4      7      9 

 sum(metrics.swiss@lmg)
 orderComponents<-c(5,1,3,8,7,4,9,2,10,6)
 PCAFactors<-Project.Data.PCA$scores
 Rotated<-as.data.frame(cbind(Resp=test_dat$Resp,PCAFactors))
 swissRotatedReordered<-Rotated[,c(1,orderComponents+1)]
  (nestedRSquared<-sapply(2:11,function(z) 
  summary(lm(Resp~.,data=swissRotatedReordered[,1:z]))$r.squared))
 [1] 0.001041492 0.622569992 0.689046489 0.690319839 0.715051745 0.732286987
 [7] 0.742441421 0.991291253 0.995263470 0.997035905

你 运行 一个线性模型与你的分数的新模型。 “lmg”可以让您看到哪些因素贡献最大,哪些是您应该保留的因素。在我的例子中,它是前 3 个因素

 predictors <- test_dat[-1]
 Project.Data.PCA <- princomp(predictors)
 summary(Project.Data.PCA)
PCAFactors<-Project.Data.PCA$scores
Rotated<-as.data.frame(cbind(Resp=test_dat$Resp,PCAFactors))
linModPCA<-lm(Resp~.,data=Rotated)
metrics.swiss <- calc.relimp(linModPCA, type = c("lmg", "first", "last","betasq", 
"pratt"))
 metrics.swiss