您如何从 `tune_grid` object 中获取测试错误指标?
How do you get the test error metrics from a `tune_grid` object?
我对 tune::tune_grid()
的输出感到困惑。本质上,我想获得网格中任何给定超参数集的残差均方误差(rmse
's)。
例如,以下代码使用 10 倍 cross-validation 在岭回归中尝试 50 个不同的 penalty
值。
# Silly data
df <- ISLR::College
# 10 folds
set.seed(42)
cv <- vfold_cv(data = df, v = 10)
# Normalize predictors in a pipeline
recipe <-
recipe(formula = Apps ~ ., data = df) %>%
step_novel(all_nominal_predictors()) %>%
step_dummy(all_nominal_predictors()) %>%
step_zv(all_predictors()) %>%
step_normalize(all_predictors())
# Ridge regression instance with tuneable `penalty`
ridge_spec <-
linear_reg(penalty = tune(), mixture = 0) %>%
set_mode("regression") %>%
set_engine("glmnet")
# Last two steps in a workflow
ridge_workflow <- workflow() %>%
add_recipe(recipe) %>% # Normalize
add_model(ridge_spec) # Fit
# Grid of penalty hyperparameters
penalty_grid <- grid_regular(penalty(range = c(-5, 5)), levels = 50)
# Fit a model per penalty value on 10 folds
ridge_grid <- tune_grid(
object = ridge_workflow,
resamples = cv,
grid = penalty_grid,
control = control_grid(verbose = FALSE)
)
我想获得 10 个 rmse
的最佳模型。
我以为 ridge_grid$.metrics
会有这个信息,但它有 10 个小标题,每个小标题 10 行。这些是什么意思?
如何获得 10 个 rmse
的最佳模型?
在 ridge_grid$.metrics
中,您可以获得每个参数的所有保持性能估计。要获取每个参数组合的平均指标值,您可以使用 collect_metric()
:
estimates <- collect_metrics(ridge_grid)
estimates
# A tibble: 100 × 7
penalty .metric .estimator mean n std_err .config
<dbl> <chr> <chr> <dbl> <int> <dbl> <chr>
1 0.00001 rmse standard 1183. 10 162. Preprocessor1_Model01
2 0.00001 rsq standard 0.913 10 0.00823 Preprocessor1_Model01
3 0.0000160 rmse standard 1183. 10 162. Preprocessor1_Model02
4 0.0000160 rsq standard 0.913 10 0.00823 Preprocessor1_Model02
5 0.0000256 rmse standard 1183. 10 162. Preprocessor1_Model03
6 0.0000256 rsq standard 0.913 10 0.00823 Preprocessor1_Model03
7 0.0000409 rmse standard 1183. 10 162. Preprocessor1_Model04
8 0.0000409 rsq standard 0.913 10 0.00823 Preprocessor1_Model04
9 0.0000655 rmse standard 1183. 10 162. Preprocessor1_Model05
10 0.0000655 rsq standard 0.913 10 0.00823 Preprocessor1_Model05
# … with 90 more rows
要获得 10 次重采样的平均值和最佳 RMSE,您可以使用以下代码:
rmse_vals <-
estimates %>%
dplyr::filter(.metric == "rmse") %>%
arrange(desc(mean))
rmse_vals
# A tibble: 50 × 7
penalty .metric .estimator mean n std_err .config
<dbl> <chr> <chr> <dbl> <int> <dbl> <chr>
1 100000 rmse standard 3332. 10 309. Preprocessor1_Model50
2 62506. rmse standard 3136. 10 306. Preprocessor1_Model49
3 39069. rmse standard 2884. 10 302. Preprocessor1_Model48
4 24421. rmse standard 2589. 10 295. Preprocessor1_Model47
5 15264. rmse standard 2281. 10 285. Preprocessor1_Model46
6 9541. rmse standard 1993. 10 272. Preprocessor1_Model45
7 5964. rmse standard 1753. 10 258. Preprocessor1_Model44
8 3728. rmse standard 1568. 10 243. Preprocessor1_Model43
9 2330. rmse standard 1435. 10 227. Preprocessor1_Model42
10 1456. rmse standard 1342. 10 211. Preprocessor1_Model41
# … with 40 more rows
哪个给你最好的 RMSE
值。绘制值时,您可以检查值是否正确:
autoplot(ridge_grid, metric = "rmse")
输出:
我对 tune::tune_grid()
的输出感到困惑。本质上,我想获得网格中任何给定超参数集的残差均方误差(rmse
's)。
例如,以下代码使用 10 倍 cross-validation 在岭回归中尝试 50 个不同的 penalty
值。
# Silly data
df <- ISLR::College
# 10 folds
set.seed(42)
cv <- vfold_cv(data = df, v = 10)
# Normalize predictors in a pipeline
recipe <-
recipe(formula = Apps ~ ., data = df) %>%
step_novel(all_nominal_predictors()) %>%
step_dummy(all_nominal_predictors()) %>%
step_zv(all_predictors()) %>%
step_normalize(all_predictors())
# Ridge regression instance with tuneable `penalty`
ridge_spec <-
linear_reg(penalty = tune(), mixture = 0) %>%
set_mode("regression") %>%
set_engine("glmnet")
# Last two steps in a workflow
ridge_workflow <- workflow() %>%
add_recipe(recipe) %>% # Normalize
add_model(ridge_spec) # Fit
# Grid of penalty hyperparameters
penalty_grid <- grid_regular(penalty(range = c(-5, 5)), levels = 50)
# Fit a model per penalty value on 10 folds
ridge_grid <- tune_grid(
object = ridge_workflow,
resamples = cv,
grid = penalty_grid,
control = control_grid(verbose = FALSE)
)
我想获得 10 个 rmse
的最佳模型。
我以为 ridge_grid$.metrics
会有这个信息,但它有 10 个小标题,每个小标题 10 行。这些是什么意思?
如何获得 10 个 rmse
的最佳模型?
在 ridge_grid$.metrics
中,您可以获得每个参数的所有保持性能估计。要获取每个参数组合的平均指标值,您可以使用 collect_metric()
:
estimates <- collect_metrics(ridge_grid)
estimates
# A tibble: 100 × 7
penalty .metric .estimator mean n std_err .config
<dbl> <chr> <chr> <dbl> <int> <dbl> <chr>
1 0.00001 rmse standard 1183. 10 162. Preprocessor1_Model01
2 0.00001 rsq standard 0.913 10 0.00823 Preprocessor1_Model01
3 0.0000160 rmse standard 1183. 10 162. Preprocessor1_Model02
4 0.0000160 rsq standard 0.913 10 0.00823 Preprocessor1_Model02
5 0.0000256 rmse standard 1183. 10 162. Preprocessor1_Model03
6 0.0000256 rsq standard 0.913 10 0.00823 Preprocessor1_Model03
7 0.0000409 rmse standard 1183. 10 162. Preprocessor1_Model04
8 0.0000409 rsq standard 0.913 10 0.00823 Preprocessor1_Model04
9 0.0000655 rmse standard 1183. 10 162. Preprocessor1_Model05
10 0.0000655 rsq standard 0.913 10 0.00823 Preprocessor1_Model05
# … with 90 more rows
要获得 10 次重采样的平均值和最佳 RMSE,您可以使用以下代码:
rmse_vals <-
estimates %>%
dplyr::filter(.metric == "rmse") %>%
arrange(desc(mean))
rmse_vals
# A tibble: 50 × 7
penalty .metric .estimator mean n std_err .config
<dbl> <chr> <chr> <dbl> <int> <dbl> <chr>
1 100000 rmse standard 3332. 10 309. Preprocessor1_Model50
2 62506. rmse standard 3136. 10 306. Preprocessor1_Model49
3 39069. rmse standard 2884. 10 302. Preprocessor1_Model48
4 24421. rmse standard 2589. 10 295. Preprocessor1_Model47
5 15264. rmse standard 2281. 10 285. Preprocessor1_Model46
6 9541. rmse standard 1993. 10 272. Preprocessor1_Model45
7 5964. rmse standard 1753. 10 258. Preprocessor1_Model44
8 3728. rmse standard 1568. 10 243. Preprocessor1_Model43
9 2330. rmse standard 1435. 10 227. Preprocessor1_Model42
10 1456. rmse standard 1342. 10 211. Preprocessor1_Model41
# … with 40 more rows
哪个给你最好的 RMSE
值。绘制值时,您可以检查值是否正确:
autoplot(ridge_grid, metric = "rmse")
输出: