从训练有素的插入符号模型中提取 beta 值
Extracting beta values from trained caret model
我正在尝试从插入符号包中使用 train()
确定的模型中提取 beta 值。
cv_model_pls <- train(
POD1HemoglobinCut ~ .,
data = train,
method = "pls",
family = "binomial",
trControl = trainControl(method = "cv", number = 10),
preProcess = c("zv", "center", "scale"),
tuneLength = 6
)
输出为:
> cv_model_pls
Partial Least Squares
9932 samples
7 predictor
2 classes: '[0,10)', '[10,Inf)'
Pre-processing: centered (7), scaled (7)
Resampling: Cross-Validated (10 fold)
Summary of sample sizes: 8939, 8939, 8939, 8938, 8940, 8939, ...
Resampling results across tuning parameters:
ncomp Accuracy Kappa
1 0.8569258 0.1994938
2 0.8698149 0.3215483
3 0.8707213 0.3303433
4 0.8710237 0.3335666
5 0.8710238 0.3341072
6 0.8708224 0.3330295
Accuracy was used to select the optimal model using the largest value.
The final value used for the model was ncomp = 5.
运行 尝试获取 beta 值的摘要让我:
> summary(cv_model_pls)
Data: X dimension: 9932 7
Y dimension: 9932 2
Fit method: oscorespls
Number of components considered: 5
TRAINING: % variance explained
Error in dimnames(tbl) <- list(c("X", yvarnames), paste(1:object$ncomp, :
length of 'dimnames' [1] not equal to array extent
- 如何提取优化模型(或其他模型)的 beta 值?
- 如何通过最大化灵敏度(而不是默认的准确性)来选择模型?
对于 beta 值,我猜你指的是系数。汇总函数从 pls
调用 pls:::summary.mvr
仅 returns 解释的方差。您可以执行 ?pls:::summary.mvr
来查看它的作用。它不适用于 plsda
.
的输出
使用示例数据集,我们使用插入符号进行拟合:
set.seed(111)
df = MASS::Pima.tr
cv_model_pls <- train(type~.,data=df,method="pls",
family="binomial",trControl = trainControl(method = "cv", number = 5),
preProcess = c("center", "scale"),
tuneLength = 6
)
结果:
Partial Least Squares
200 samples
7 predictor
2 classes: 'No', 'Yes'
Pre-processing: centered (7), scaled (7)
Resampling: Cross-Validated (5 fold)
Summary of sample sizes: 159, 161, 159, 161, 160
Resampling results across tuning parameters:
ncomp Accuracy Kappa
1 0.7301063 0.3746033
2 0.7504909 0.4255505
3 0.7453627 0.4140426
4 0.7553690 0.4412532
5 0.7502408 0.4275158
6 0.7502408 0.4275158
Accuracy was used to select the optimal model using the largest value.
The final value used for the model was ncomp = 4.
你可以找到最终拟合模型下的系数:
cv_model_pls$finalModel$coefficients
它将显示最多 n 个 PC 的最佳组件,因此在本示例中,执行:
cv_model_pls$finalModel$coefficients[,,cv_model_pls$bestTune$ncomp]
No Yes
npreg -0.060740474 0.060740474
glu -0.173639051 0.173639051
bp 0.006635470 -0.006635470
skin -0.002510842 0.002510842
bmi -0.065740864 0.065740864
ped -0.086110972 0.086110972
age -0.076374824 0.076374824
对于灵敏度,在 trainControl
中使用 summaryFunction = twoClassSummary
并将度量设置为 Sens
:
model <- train(type~.,data=df,method="pls",
family="binomial",
trControl = trainControl(method = "cv",
summaryFunction = twoClassSummary,
classProbs = TRUE,
number = 5),
metric = "Sens",
preProcess = c("center", "scale"),
tuneLength = 6
)
Partial Least Squares
200 samples
7 predictor
2 classes: 'No', 'Yes'
Pre-processing: centered (7), scaled (7)
Resampling: Cross-Validated (5 fold)
Summary of sample sizes: 159, 161, 161, 159, 160
Resampling results across tuning parameters:
ncomp ROC Sens Spec
1 0.8227357 0.8635328 0.5571429
2 0.8286638 0.8555556 0.5428571
3 0.8250728 0.8709402 0.5571429
4 0.8247738 0.8555556 0.5571429
5 0.8264237 0.8555556 0.5428571
6 0.8258946 0.8632479 0.5428571
Sens was used to select the optimal model using the largest value.
The final value used for the model was ncomp = 3.
我正在尝试从插入符号包中使用 train()
确定的模型中提取 beta 值。
cv_model_pls <- train(
POD1HemoglobinCut ~ .,
data = train,
method = "pls",
family = "binomial",
trControl = trainControl(method = "cv", number = 10),
preProcess = c("zv", "center", "scale"),
tuneLength = 6
)
输出为:
> cv_model_pls
Partial Least Squares
9932 samples
7 predictor
2 classes: '[0,10)', '[10,Inf)'
Pre-processing: centered (7), scaled (7)
Resampling: Cross-Validated (10 fold)
Summary of sample sizes: 8939, 8939, 8939, 8938, 8940, 8939, ...
Resampling results across tuning parameters:
ncomp Accuracy Kappa
1 0.8569258 0.1994938
2 0.8698149 0.3215483
3 0.8707213 0.3303433
4 0.8710237 0.3335666
5 0.8710238 0.3341072
6 0.8708224 0.3330295
Accuracy was used to select the optimal model using the largest value.
The final value used for the model was ncomp = 5.
运行 尝试获取 beta 值的摘要让我:
> summary(cv_model_pls)
Data: X dimension: 9932 7
Y dimension: 9932 2
Fit method: oscorespls
Number of components considered: 5
TRAINING: % variance explained
Error in dimnames(tbl) <- list(c("X", yvarnames), paste(1:object$ncomp, :
length of 'dimnames' [1] not equal to array extent
- 如何提取优化模型(或其他模型)的 beta 值?
- 如何通过最大化灵敏度(而不是默认的准确性)来选择模型?
对于 beta 值,我猜你指的是系数。汇总函数从 pls
调用 pls:::summary.mvr
仅 returns 解释的方差。您可以执行 ?pls:::summary.mvr
来查看它的作用。它不适用于 plsda
.
使用示例数据集,我们使用插入符号进行拟合:
set.seed(111)
df = MASS::Pima.tr
cv_model_pls <- train(type~.,data=df,method="pls",
family="binomial",trControl = trainControl(method = "cv", number = 5),
preProcess = c("center", "scale"),
tuneLength = 6
)
结果:
Partial Least Squares
200 samples
7 predictor
2 classes: 'No', 'Yes'
Pre-processing: centered (7), scaled (7)
Resampling: Cross-Validated (5 fold)
Summary of sample sizes: 159, 161, 159, 161, 160
Resampling results across tuning parameters:
ncomp Accuracy Kappa
1 0.7301063 0.3746033
2 0.7504909 0.4255505
3 0.7453627 0.4140426
4 0.7553690 0.4412532
5 0.7502408 0.4275158
6 0.7502408 0.4275158
Accuracy was used to select the optimal model using the largest value.
The final value used for the model was ncomp = 4.
你可以找到最终拟合模型下的系数:
cv_model_pls$finalModel$coefficients
它将显示最多 n 个 PC 的最佳组件,因此在本示例中,执行:
cv_model_pls$finalModel$coefficients[,,cv_model_pls$bestTune$ncomp]
No Yes
npreg -0.060740474 0.060740474
glu -0.173639051 0.173639051
bp 0.006635470 -0.006635470
skin -0.002510842 0.002510842
bmi -0.065740864 0.065740864
ped -0.086110972 0.086110972
age -0.076374824 0.076374824
对于灵敏度,在 trainControl
中使用 summaryFunction = twoClassSummary
并将度量设置为 Sens
:
model <- train(type~.,data=df,method="pls",
family="binomial",
trControl = trainControl(method = "cv",
summaryFunction = twoClassSummary,
classProbs = TRUE,
number = 5),
metric = "Sens",
preProcess = c("center", "scale"),
tuneLength = 6
)
Partial Least Squares
200 samples
7 predictor
2 classes: 'No', 'Yes'
Pre-processing: centered (7), scaled (7)
Resampling: Cross-Validated (5 fold)
Summary of sample sizes: 159, 161, 161, 159, 160
Resampling results across tuning parameters:
ncomp ROC Sens Spec
1 0.8227357 0.8635328 0.5571429
2 0.8286638 0.8555556 0.5428571
3 0.8250728 0.8709402 0.5571429
4 0.8247738 0.8555556 0.5571429
5 0.8264237 0.8555556 0.5428571
6 0.8258946 0.8632479 0.5428571
Sens was used to select the optimal model using the largest value.
The final value used for the model was ncomp = 3.