在 mlr3 的基准测试中使用预定义的训练集和测试集

Question

我想使用 mlr3 中的 benchmark_grid() 函数在分类任务中比较几种机器学习算法。根据 https://mlr3book.mlr-org.com/benchmarking.html benchmark_grid() 采用重采样方案将任务中的日期划分为训练和测试数据。但是，我想使用手动分区。使用 benchmark_grid() 时如何手动指定训练集和测试集？

编辑：基于 pat-s

建议的代码示例

# use benchmark() from mlr3 to compare different classification models on the iris data set using a manually
# pre-defined partitioning into training and test data sets (hold-out sampling)

library("mlr3verse")

# Instantiate Task
task = tsk("iris")

# Instantiate Custom Resampling

# hold-out sample with pre-defined partitioning into train and test set
custom = rsmp("custom")
train_sets = list(1:120)
test_sets = list(121:150)
custom$instantiate(task, train_sets, test_sets)


design = benchmark_grid(
  tasks = task,
  learners = lrns(c("classif.ranger", "classif.rpart", "classif.featureless"),
    predict_type = "prob", predict_sets = c("train", "test")),
  resamplings = custom
)

print(design)


# execute the benchmark
bmr = benchmark(design)

measure = msr("classif.acc")

tab = bmr$aggregate(measure)
print(tab)

Answer 1

您可以使用 "custom_cv" 重采样方案。

在 mlr3 的基准测试中使用预定义的训练集和测试集

Using pre-defined train and test sets in a benchmark in mlr3

benchmarking

mlr3