从训练有素的插入符号模型中提取 beta 值

Extracting beta values from trained caret model

我正在尝试从插入符号包中使用 train() 确定的模型中提取 beta 值。

cv_model_pls <- train(
  POD1HemoglobinCut ~ ., 
  data = train, 
  method = "pls",
  family = "binomial",
  trControl = trainControl(method = "cv", number = 10),
  preProcess = c("zv", "center", "scale"),
  tuneLength = 6
)

输出为:

> cv_model_pls
Partial Least Squares 

9932 samples
   7 predictor
   2 classes: '[0,10)', '[10,Inf)' 

Pre-processing: centered (7), scaled (7) 
Resampling: Cross-Validated (10 fold) 
Summary of sample sizes: 8939, 8939, 8939, 8938, 8940, 8939, ... 
Resampling results across tuning parameters:

  ncomp  Accuracy   Kappa    
  1      0.8569258  0.1994938
  2      0.8698149  0.3215483
  3      0.8707213  0.3303433
  4      0.8710237  0.3335666
  5      0.8710238  0.3341072
  6      0.8708224  0.3330295

Accuracy was used to select the optimal model using the largest value.
The final value used for the model was ncomp = 5.

运行 尝试获取 beta 值的摘要让我:

> summary(cv_model_pls)
Data:   X dimension: 9932 7 
    Y dimension: 9932 2
Fit method: oscorespls
Number of components considered: 5
TRAINING: % variance explained
Error in dimnames(tbl) <- list(c("X", yvarnames), paste(1:object$ncomp,  : 
  length of 'dimnames' [1] not equal to array extent
  1. 如何提取优化模型(或其他模型)的 beta 值?
  2. 如何通过最大化灵敏度(而不是默认的准确性)来选择模型?

对于 beta 值,我猜你指的是系数。汇总函数从 pls 调用 pls:::summary.mvr 仅 returns 解释的方差。您可以执行 ?pls:::summary.mvr 来查看它的作用。它不适用于 plsda.

的输出

使用示例数据集,我们使用插入符号进行拟合:

set.seed(111)
df = MASS::Pima.tr

cv_model_pls <- train(type~.,data=df,method="pls",
family="binomial",trControl = trainControl(method = "cv", number = 5),
preProcess = c("center", "scale"),
tuneLength = 6
 )

结果:

Partial Least Squares 

200 samples
  7 predictor
  2 classes: 'No', 'Yes' 

Pre-processing: centered (7), scaled (7) 
Resampling: Cross-Validated (5 fold) 
Summary of sample sizes: 159, 161, 159, 161, 160 
Resampling results across tuning parameters:

  ncomp  Accuracy   Kappa    
  1      0.7301063  0.3746033
  2      0.7504909  0.4255505
  3      0.7453627  0.4140426
  4      0.7553690  0.4412532
  5      0.7502408  0.4275158
  6      0.7502408  0.4275158

Accuracy was used to select the optimal model using the largest value.
The final value used for the model was ncomp = 4.

你可以找到最终拟合模型下的系数:

cv_model_pls$finalModel$coefficients

它将显示最多 n 个 PC 的最佳组件,因此在本示例中,执行:

cv_model_pls$finalModel$coefficients[,,cv_model_pls$bestTune$ncomp]
                No          Yes
npreg -0.060740474  0.060740474
glu   -0.173639051  0.173639051
bp     0.006635470 -0.006635470
skin  -0.002510842  0.002510842
bmi   -0.065740864  0.065740864
ped   -0.086110972  0.086110972
age   -0.076374824  0.076374824

对于灵敏度,在 trainControl 中使用 summaryFunction = twoClassSummary 并将度量设置为 Sens :

model <- train(type~.,data=df,method="pls",
    family="binomial",
    trControl = trainControl(method = "cv", 
    summaryFunction = twoClassSummary,
    classProbs = TRUE,
    number = 5),
    metric = "Sens",
    preProcess = c("center", "scale"),
    tuneLength = 6
     )

Partial Least Squares 

200 samples
  7 predictor
  2 classes: 'No', 'Yes' 

Pre-processing: centered (7), scaled (7) 
Resampling: Cross-Validated (5 fold) 
Summary of sample sizes: 159, 161, 161, 159, 160 
Resampling results across tuning parameters:

  ncomp  ROC        Sens       Spec     
  1      0.8227357  0.8635328  0.5571429
  2      0.8286638  0.8555556  0.5428571
  3      0.8250728  0.8709402  0.5571429
  4      0.8247738  0.8555556  0.5571429
  5      0.8264237  0.8555556  0.5428571
  6      0.8258946  0.8632479  0.5428571

Sens was used to select the optimal model using the largest value.
The final value used for the model was ncomp = 3.