使用 mlr3pipeline 编码和缩放后无法通过 mlr3proba 训练数据集
Unable to train dataset by mlr3proba after encoding and scaling it with mlr3pipeline
当我 运行 在使用 mlr3pipeline 编码和缩放我的数据集后,在 mlr3proba 中训练模型的代码时:
task =tsk("sonar")
learner = lrn("classif.rpart")
measure = msr("classif.ce")
inner.rsmp <- rsm("cv", folds = 5)
train_set = sample(task$nrow, 0.8 * task$nrow)
test_set = setdiff(seq_len(task$nrow), train_set)
learner <- po("encode") %>>% po("scale") %>>% po("learner", learner)
learner$train(task, row_ids = train_set)
R代码显示错误如下:
Error in learner$train(task, row_ids = train_set) :
unused argument (row_ids = train_set)
我在另一个数据集中试过这个,但它显示了同样的问题。
但如果我不编码和缩放我的数据集,一切正常。
此外,对于 resample()
函数,没问题(尽管有编码和缩放):
rr <- resample(task, learner, inner.rsmp)
rr$aggregate(measure)
#Results:
INFO [08:46:55.411] [mlr3] Applying learner 'encode.scale.classif.rpart' on task 'sonar' (iter 4/5)
INFO [08:46:55.539] [mlr3] Applying learner 'encode.scale.classif.rpart' on task 'sonar' (iter 1/5)
INFO [08:46:55.644] [mlr3] Applying learner 'encode.scale.classif.rpart' on task 'sonar' (iter 2/5)
INFO [08:46:55.773] [mlr3] Applying learner 'encode.scale.classif.rpart' on task 'sonar' (iter 5/5)
INFO [08:46:55.876] [mlr3] Applying learner 'encode.scale.classif.rpart' on task 'sonar' (iter 3/5)
rr$score(measure)
task task_id learner learner_id resampling
1: <TaskClassif[46]> sonar <GraphLearner[33]> encode.scale.classif.rpart <ResamplingCV[19]>
2: <TaskClassif[46]> sonar <GraphLearner[33]> encode.scale.classif.rpart <ResamplingCV[19]>
3: <TaskClassif[46]> sonar <GraphLearner[33]> encode.scale.classif.rpart <ResamplingCV[19]>
4: <TaskClassif[46]> sonar <GraphLearner[33]> encode.scale.classif.rpart <ResamplingCV[19]>
5: <TaskClassif[46]> sonar <GraphLearner[33]> encode.scale.classif.rpart <ResamplingCV[19]>
resampling_id iteration prediction classif.ce
1: cv 1 <PredictionClassif[19]> 0.3333333
2: cv 2 <PredictionClassif[19]> 0.2142857
3: cv 3 <PredictionClassif[19]> 0.2380952
4: cv 4 <PredictionClassif[19]> 0.3658537
5: cv 5 <PredictionClassif[19]> 0.2439024
那么问题出在哪里呢?
您需要将学习器包装在 GraphLearner PipeOp 中:
library(mlr3)
library(mlr3pipelines)
task =tsk("sonar")
learner = lrn("classif.rpart")
measure = msr("classif.ce")
inner.rsmp <- rsmp("cv", folds = 5)
train_set = sample(task$nrow, 0.8 * task$nrow)
test_set = setdiff(seq_len(task$nrow), train_set)
learner <- po("encode") %>>% po("scale") %>>% po("learner", learner)
learner <- GraphLearner$new(learner)
learner$train(task, row_ids = train_set)
learner$predict(task, row_ids = test_set)
#> <PredictionClassif> for 42 observations:
#> row_ids truth response
#> 5 R R
#> 12 R R
#> 13 R R
#> ---
#> 188 M M
#> 191 M M
#> 201 M M
由 reprex package (v0.3.0)
于 2021-04-30 创建
当我 运行 在使用 mlr3pipeline 编码和缩放我的数据集后,在 mlr3proba 中训练模型的代码时:
task =tsk("sonar")
learner = lrn("classif.rpart")
measure = msr("classif.ce")
inner.rsmp <- rsm("cv", folds = 5)
train_set = sample(task$nrow, 0.8 * task$nrow)
test_set = setdiff(seq_len(task$nrow), train_set)
learner <- po("encode") %>>% po("scale") %>>% po("learner", learner)
learner$train(task, row_ids = train_set)
R代码显示错误如下:
Error in learner$train(task, row_ids = train_set) :
unused argument (row_ids = train_set)
我在另一个数据集中试过这个,但它显示了同样的问题。
但如果我不编码和缩放我的数据集,一切正常。
此外,对于 resample()
函数,没问题(尽管有编码和缩放):
rr <- resample(task, learner, inner.rsmp)
rr$aggregate(measure)
#Results:
INFO [08:46:55.411] [mlr3] Applying learner 'encode.scale.classif.rpart' on task 'sonar' (iter 4/5)
INFO [08:46:55.539] [mlr3] Applying learner 'encode.scale.classif.rpart' on task 'sonar' (iter 1/5)
INFO [08:46:55.644] [mlr3] Applying learner 'encode.scale.classif.rpart' on task 'sonar' (iter 2/5)
INFO [08:46:55.773] [mlr3] Applying learner 'encode.scale.classif.rpart' on task 'sonar' (iter 5/5)
INFO [08:46:55.876] [mlr3] Applying learner 'encode.scale.classif.rpart' on task 'sonar' (iter 3/5)
rr$score(measure)
task task_id learner learner_id resampling
1: <TaskClassif[46]> sonar <GraphLearner[33]> encode.scale.classif.rpart <ResamplingCV[19]>
2: <TaskClassif[46]> sonar <GraphLearner[33]> encode.scale.classif.rpart <ResamplingCV[19]>
3: <TaskClassif[46]> sonar <GraphLearner[33]> encode.scale.classif.rpart <ResamplingCV[19]>
4: <TaskClassif[46]> sonar <GraphLearner[33]> encode.scale.classif.rpart <ResamplingCV[19]>
5: <TaskClassif[46]> sonar <GraphLearner[33]> encode.scale.classif.rpart <ResamplingCV[19]>
resampling_id iteration prediction classif.ce
1: cv 1 <PredictionClassif[19]> 0.3333333
2: cv 2 <PredictionClassif[19]> 0.2142857
3: cv 3 <PredictionClassif[19]> 0.2380952
4: cv 4 <PredictionClassif[19]> 0.3658537
5: cv 5 <PredictionClassif[19]> 0.2439024
那么问题出在哪里呢?
您需要将学习器包装在 GraphLearner PipeOp 中:
library(mlr3)
library(mlr3pipelines)
task =tsk("sonar")
learner = lrn("classif.rpart")
measure = msr("classif.ce")
inner.rsmp <- rsmp("cv", folds = 5)
train_set = sample(task$nrow, 0.8 * task$nrow)
test_set = setdiff(seq_len(task$nrow), train_set)
learner <- po("encode") %>>% po("scale") %>>% po("learner", learner)
learner <- GraphLearner$new(learner)
learner$train(task, row_ids = train_set)
learner$predict(task, row_ids = test_set)
#> <PredictionClassif> for 42 observations:
#> row_ids truth response
#> 5 R R
#> 12 R R
#> 13 R R
#> ---
#> 188 M M
#> 191 M M
#> 201 M M
由 reprex package (v0.3.0)
于 2021-04-30 创建