mlr3:将 benchmark() 与调优模型(即 AutoTuner 对象)结合使用
mlr3: using benchmark() with tuned models (i.e. AutoTuner objects)
我想比较几种机器学习算法(例如,来自 rpart、xgb 等的决策树)的性能,包括它们使用 mlr3 进行的超参数调整。换句话说,我想比较不同算法的已经调整的实例,而不是将算法与它们的默认超参数值进行比较。
mlr3提供了AutoTuner-Objects来进行嵌套重采样和超参数调优。还有一个 benchmark() 函数可以对多个学习器进行比较。 benchmark() 函数又使用 benchmark_grid() 来设置基准测试。根据 this manual,可以将“一个 AutoTuner 对象传递给 mlr3::resample() 或 mlr3::benchmark()”。我不明白如何将 AutoTuner 对象传递给 benchmark_grid()。
以下代码(使用默认版本对调整后的决策树进行基准测试;基于 this book 中的代码)不起作用。它 returns 一条错误消息:“错误:在 DictionaryLearner 中找不到带有键 'rpart_tuned' 的元素!”
library("mlr3verse")
### Benchmarking including hyperparameter tuning
# nested resampling:
# - inner sampling: 5-fold CV
# - outer sampling: manually defined hold-out sample
# defining AutoTuner for the inner resampling
learner = lrn("classif.rpart")
resampling = rsmp("cv", folds = 5)
# resampling = rsmp("holdout")
measure = msr("classif.acc")
search_space = ps(maxdepth = p_int(lower = 1, upper = 10))
terminator = trm("none")
tuner = tnr("grid_search", resolution = 5)
rpart_tuned = AutoTuner$new(learner, resampling, measure, terminator, tuner, search_space)
## Outer re-sampling
# hold-out sample with pre-defined partitioning into train and test set
outer_resampling = rsmp("custom")
train_sets = list(1:120)
test_sets = list(121:150)
outer_resampling$instantiate(task, train_sets, test_sets)
## Defining benchmark design
design = benchmark_grid(
tasks = tsks(c("iris")),
learners = lrns(c("rpart_tuned", "classif.rpart"),
predict_type = "prob", predict_sets = c("train", "test")),
resamplings = outer_resampling
)
你的代码中的问题是你试图创建一个新的学习者而不是在
中重用你自己的学习者
lrns(c("rpart_tuned", "classif.rpart"),
predict_type = "prob", predict_sets = c("train", "test")),
lrns(c("rpart_tuned"))
正在尝试从 mlr3 中的内置学习器字典中检索 rpart_tuned
。
如果您想重复使用 rpart_tuned
,只需这样做:
design = benchmark_grid(
tasks = tsks(c("iris")),
learners = c(rpart_tuned, lrn("classif.rpart",
predict_type = "prob", predict_sets = c("train", "test"))),
resamplings = outer_resampling
)
这又会使用 rpart_tuned
自动调谐器并从字典中创建一个新的学习器 classif.rpart
。
我想比较几种机器学习算法(例如,来自 rpart、xgb 等的决策树)的性能,包括它们使用 mlr3 进行的超参数调整。换句话说,我想比较不同算法的已经调整的实例,而不是将算法与它们的默认超参数值进行比较。
mlr3提供了AutoTuner-Objects来进行嵌套重采样和超参数调优。还有一个 benchmark() 函数可以对多个学习器进行比较。 benchmark() 函数又使用 benchmark_grid() 来设置基准测试。根据 this manual,可以将“一个 AutoTuner 对象传递给 mlr3::resample() 或 mlr3::benchmark()”。我不明白如何将 AutoTuner 对象传递给 benchmark_grid()。
以下代码(使用默认版本对调整后的决策树进行基准测试;基于 this book 中的代码)不起作用。它 returns 一条错误消息:“错误:在 DictionaryLearner 中找不到带有键 'rpart_tuned' 的元素!”
library("mlr3verse")
### Benchmarking including hyperparameter tuning
# nested resampling:
# - inner sampling: 5-fold CV
# - outer sampling: manually defined hold-out sample
# defining AutoTuner for the inner resampling
learner = lrn("classif.rpart")
resampling = rsmp("cv", folds = 5)
# resampling = rsmp("holdout")
measure = msr("classif.acc")
search_space = ps(maxdepth = p_int(lower = 1, upper = 10))
terminator = trm("none")
tuner = tnr("grid_search", resolution = 5)
rpart_tuned = AutoTuner$new(learner, resampling, measure, terminator, tuner, search_space)
## Outer re-sampling
# hold-out sample with pre-defined partitioning into train and test set
outer_resampling = rsmp("custom")
train_sets = list(1:120)
test_sets = list(121:150)
outer_resampling$instantiate(task, train_sets, test_sets)
## Defining benchmark design
design = benchmark_grid(
tasks = tsks(c("iris")),
learners = lrns(c("rpart_tuned", "classif.rpart"),
predict_type = "prob", predict_sets = c("train", "test")),
resamplings = outer_resampling
)
你的代码中的问题是你试图创建一个新的学习者而不是在
中重用你自己的学习者lrns(c("rpart_tuned", "classif.rpart"),
predict_type = "prob", predict_sets = c("train", "test")),
lrns(c("rpart_tuned"))
正在尝试从 mlr3 中的内置学习器字典中检索 rpart_tuned
。
如果您想重复使用 rpart_tuned
,只需这样做:
design = benchmark_grid(
tasks = tsks(c("iris")),
learners = c(rpart_tuned, lrn("classif.rpart",
predict_type = "prob", predict_sets = c("train", "test"))),
resamplings = outer_resampling
)
这又会使用 rpart_tuned
自动调谐器并从字典中创建一个新的学习器 classif.rpart
。