为什么需要用 caret::train(..., method = "glmnet") 和 cv.glmnet() 调整 lambda?
Why need to tune lambda with caret::train(..., method = "glmnet") and cv.glmnet()?
正如我们所看到的,caret::train(..., method = "glmnet")
与交叉验证或 cv.glmnet()
都可以找到 lambda.min
来最小化交叉验证错误。最终的最佳拟合模型应该是拟合 lambda.min
的模型。那么,为什么我们需要在训练过程中设置lambda
个值的网格?
我们为 glmnet
模型使用自定义调整网格,因为默认调整网格非常小,而且我们可能想要探索更多潜在的 glmnet
模型。
glmnet
能够拟合 2 种不同的惩罚模型,它有 2 个调整参数:
- 阿尔法
- 岭回归(或 alpha = 0)
- 套索回归(或 alpha = 1)
- λ
- 系数惩罚强度
glmnet
模型可以一次拟合多个模型(对于单个alpha
,lambda
的所有值同时拟合),我们可以通过大量的lambda
控制模型中惩罚量的值。
train()
足够聪明,每个 alpha
值只适合一个模型,并将所有 lambda
值同时传递给一个模型。
示例:
# Make a custom tuning grid
tuneGrid <- expand.grid(alpha = 0:1, lambda = seq(0.0001, 1, length = 10))
# Fit a model
model <- train(y ~ ., overfit, method = "glmnet",
tuneGrid = tuneGrid, trControl = myControl
)
# Sample Output
Warning message: The metric "Accuracy" was not in the result set. ROC will be used instead.
+ Fold01: alpha=0, lambda=1
- Fold01: alpha=0, lambda=1
+ Fold01: alpha=1, lambda=1
- Fold01: alpha=1, lambda=1
+ Fold02: alpha=0, lambda=1
- Fold02: alpha=0, lambda=1
+ Fold02: alpha=1, lambda=1
- Fold02: alpha=1, lambda=1
+ Fold03: alpha=0, lambda=1
- Fold03: alpha=0, lambda=1
+ Fold03: alpha=1, lambda=1
- Fold03: alpha=1, lambda=1
+ Fold04: alpha=0, lambda=1
- Fold04: alpha=0, lambda=1
+ Fold04: alpha=1, lambda=1
- Fold04: alpha=1, lambda=1
+ Fold05: alpha=0, lambda=1
- Fold05: alpha=0, lambda=1
+ Fold05: alpha=1, lambda=1
- Fold05: alpha=1, lambda=1
+ Fold06: alpha=0, lambda=1
- Fold06: alpha=0, lambda=1
+ Fold06: alpha=1, lambda=1
- Fold06: alpha=1, lambda=1
+ Fold07: alpha=0, lambda=1
- Fold07: alpha=0, lambda=1
+ Fold07: alpha=1, lambda=1
- Fold07: alpha=1, lambda=1
+ Fold08: alpha=0, lambda=1
- Fold08: alpha=0, lambda=1
+ Fold08: alpha=1, lambda=1
- Fold08: alpha=1, lambda=1
+ Fold09: alpha=0, lambda=1
- Fold09: alpha=0, lambda=1
+ Fold09: alpha=1, lambda=1
- Fold09: alpha=1, lambda=1
+ Fold10: alpha=0, lambda=1
- Fold10: alpha=0, lambda=1
+ Fold10: alpha=1, lambda=1
- Fold10: alpha=1, lambda=1
Aggregating results
Selecting tuning parameters
Fitting alpha = 1, lambda = 1 on full training set
# Print model to console
model
# Sample Output
glmnet
250 samples
200 predictors
2 classes: 'class1', 'class2'
No pre-processing
Resampling: Cross-Validated (10 fold)
Summary of sample sizes: 225, 225, 225, 225, 224, 226, ...
Resampling results across tuning parameters:
alpha lambda ROC Sens Spec
0 0.0001 0.3877717 0.00 0.9786232
0 0.1112 0.4352355 0.00 1.0000000
0 0.2223 0.4546196 0.00 1.0000000
0 0.3334 0.4589674 0.00 1.0000000
0 0.4445 0.4718297 0.00 1.0000000
0 0.5556 0.4762681 0.00 1.0000000
0 0.6667 0.4783514 0.00 1.0000000
0 0.7778 0.4826087 0.00 1.0000000
0 0.8889 0.4869565 0.00 1.0000000
0 1.0000 0.4869565 0.00 1.0000000
1 0.0001 0.3368659 0.05 0.9188406
1 0.1112 0.5000000 0.00 1.0000000
1 0.2223 0.5000000 0.00 1.0000000
1 0.3334 0.5000000 0.00 1.0000000
1 0.4445 0.5000000 0.00 1.0000000
1 0.5556 0.5000000 0.00 1.0000000
1 0.6667 0.5000000 0.00 1.0000000
1 0.7778 0.5000000 0.00 1.0000000
1 0.8889 0.5000000 0.00 1.0000000
1 1.0000 0.5000000 0.00 1.0000000
ROC was used to select the optimal model using the largest value.
The final values used for the model were alpha = 1 and lambda = 1.
# Plot model
plot(model)
正如我们所看到的,caret::train(..., method = "glmnet")
与交叉验证或 cv.glmnet()
都可以找到 lambda.min
来最小化交叉验证错误。最终的最佳拟合模型应该是拟合 lambda.min
的模型。那么,为什么我们需要在训练过程中设置lambda
个值的网格?
我们为 glmnet
模型使用自定义调整网格,因为默认调整网格非常小,而且我们可能想要探索更多潜在的 glmnet
模型。
glmnet
能够拟合 2 种不同的惩罚模型,它有 2 个调整参数:
- 阿尔法
- 岭回归(或 alpha = 0)
- 套索回归(或 alpha = 1)
- λ
- 系数惩罚强度
glmnet
模型可以一次拟合多个模型(对于单个alpha
,lambda
的所有值同时拟合),我们可以通过大量的lambda
控制模型中惩罚量的值。
train()
足够聪明,每个 alpha
值只适合一个模型,并将所有 lambda
值同时传递给一个模型。
示例:
# Make a custom tuning grid
tuneGrid <- expand.grid(alpha = 0:1, lambda = seq(0.0001, 1, length = 10))
# Fit a model
model <- train(y ~ ., overfit, method = "glmnet",
tuneGrid = tuneGrid, trControl = myControl
)
# Sample Output
Warning message: The metric "Accuracy" was not in the result set. ROC will be used instead.
+ Fold01: alpha=0, lambda=1
- Fold01: alpha=0, lambda=1
+ Fold01: alpha=1, lambda=1
- Fold01: alpha=1, lambda=1
+ Fold02: alpha=0, lambda=1
- Fold02: alpha=0, lambda=1
+ Fold02: alpha=1, lambda=1
- Fold02: alpha=1, lambda=1
+ Fold03: alpha=0, lambda=1
- Fold03: alpha=0, lambda=1
+ Fold03: alpha=1, lambda=1
- Fold03: alpha=1, lambda=1
+ Fold04: alpha=0, lambda=1
- Fold04: alpha=0, lambda=1
+ Fold04: alpha=1, lambda=1
- Fold04: alpha=1, lambda=1
+ Fold05: alpha=0, lambda=1
- Fold05: alpha=0, lambda=1
+ Fold05: alpha=1, lambda=1
- Fold05: alpha=1, lambda=1
+ Fold06: alpha=0, lambda=1
- Fold06: alpha=0, lambda=1
+ Fold06: alpha=1, lambda=1
- Fold06: alpha=1, lambda=1
+ Fold07: alpha=0, lambda=1
- Fold07: alpha=0, lambda=1
+ Fold07: alpha=1, lambda=1
- Fold07: alpha=1, lambda=1
+ Fold08: alpha=0, lambda=1
- Fold08: alpha=0, lambda=1
+ Fold08: alpha=1, lambda=1
- Fold08: alpha=1, lambda=1
+ Fold09: alpha=0, lambda=1
- Fold09: alpha=0, lambda=1
+ Fold09: alpha=1, lambda=1
- Fold09: alpha=1, lambda=1
+ Fold10: alpha=0, lambda=1
- Fold10: alpha=0, lambda=1
+ Fold10: alpha=1, lambda=1
- Fold10: alpha=1, lambda=1
Aggregating results
Selecting tuning parameters
Fitting alpha = 1, lambda = 1 on full training set
# Print model to console
model
# Sample Output
glmnet
250 samples
200 predictors
2 classes: 'class1', 'class2'
No pre-processing
Resampling: Cross-Validated (10 fold)
Summary of sample sizes: 225, 225, 225, 225, 224, 226, ...
Resampling results across tuning parameters:
alpha lambda ROC Sens Spec
0 0.0001 0.3877717 0.00 0.9786232
0 0.1112 0.4352355 0.00 1.0000000
0 0.2223 0.4546196 0.00 1.0000000
0 0.3334 0.4589674 0.00 1.0000000
0 0.4445 0.4718297 0.00 1.0000000
0 0.5556 0.4762681 0.00 1.0000000
0 0.6667 0.4783514 0.00 1.0000000
0 0.7778 0.4826087 0.00 1.0000000
0 0.8889 0.4869565 0.00 1.0000000
0 1.0000 0.4869565 0.00 1.0000000
1 0.0001 0.3368659 0.05 0.9188406
1 0.1112 0.5000000 0.00 1.0000000
1 0.2223 0.5000000 0.00 1.0000000
1 0.3334 0.5000000 0.00 1.0000000
1 0.4445 0.5000000 0.00 1.0000000
1 0.5556 0.5000000 0.00 1.0000000
1 0.6667 0.5000000 0.00 1.0000000
1 0.7778 0.5000000 0.00 1.0000000
1 0.8889 0.5000000 0.00 1.0000000
1 1.0000 0.5000000 0.00 1.0000000
ROC was used to select the optimal model using the largest value.
The final values used for the model were alpha = 1 and lambda = 1.
# Plot model
plot(model)