mlr3 - 对新数据进行预处理
mlr3 - Apply pre-processing to new data
在这里使用 lmr3verse
包。假设我对用于训练 Learner
:
的训练集应用了以下预处理
preprocess <- po("scale", param_vals = list(center = TRUE, scale = TRUE)) %>>%
po("encode",param_vals = list(method = "one-hot"))
我想用命令 predict(Learner, newdata = pred, predict_type="prob")
预测数据框中包含的新观测值的标签(使用原始变量)pred
。这行不通,因为 Learner
是使用居中、缩放和单热编码变量训练的。
如何将训练集上使用的相同预处理应用于新数据(仅特征,而非响应)以进行预测?
我不是 100% 确定,但您似乎可以将新数据提供给新任务并将其提供给 predict
。 This page shows an example of combining mlr_pipeops
and learner
objects.
library(dplyr)
library(mlr3verse)
df_iris <- iris
df_iris$Petal.Width = df_iris$Petal.Width %>% cut( breaks = c(0,0.5,1,1.5,2,Inf))
task = TaskClassif$new(id = "my_iris",
backend = df_iris,
target = "Species")
train_set = sample(task$nrow, 0.8 * task$nrow)
test_set = setdiff(seq_len(task$nrow), train_set)
task_train = TaskClassif$new(id = "my_iris",
backend = df_iris[train_set,], # use train_set
target = "Species")
graph = po("scale", param_vals = list(center = TRUE, scale = TRUE)) %>>%
po("encode", param_vals = list(method = "one-hot")) %>>%
mlr_pipeops$get("learner",
learner = mlr_learners$get("classif.rpart"))
graph$train(task_train)
graph$pipeops$encode$state$outtasklayout # inspect model input types
graph$pipeops$classif.rpart$predict_type = "prob"
task_test = TaskClassif$new(id = "my_iris_test",
backend = df_iris[test_set,], # use test_set
target = "Species")
pred = graph$predict(task_test)
pred$classif.rpart.output$prob
# when you don't have a target variable, just make up one
df_test2 <- df_iris[test_set,]
df_test2$Species = sample(df_iris$Species, length(test_set)) # made-up target
task_test2 = TaskClassif$new(id = "my_iris_test",
backend = df_test2, # use test_set
target = "Species")
pred2= graph$predict(task_test2)
pred2$classif.rpart.output$prob
正如@missuse 所建议的,通过使用 graph <- preprocess %>>% Learner
然后 graph_learner <- GraphLearner$new(graph)
命令,我可以预测 --- predict(TunedLearner, newdata = pred, predict_type="prob")
--- 使用原始 data.frame
。
在这里使用 lmr3verse
包。假设我对用于训练 Learner
:
preprocess <- po("scale", param_vals = list(center = TRUE, scale = TRUE)) %>>%
po("encode",param_vals = list(method = "one-hot"))
我想用命令 predict(Learner, newdata = pred, predict_type="prob")
预测数据框中包含的新观测值的标签(使用原始变量)pred
。这行不通,因为 Learner
是使用居中、缩放和单热编码变量训练的。
如何将训练集上使用的相同预处理应用于新数据(仅特征,而非响应)以进行预测?
我不是 100% 确定,但您似乎可以将新数据提供给新任务并将其提供给 predict
。 This page shows an example of combining mlr_pipeops
and learner
objects.
library(dplyr)
library(mlr3verse)
df_iris <- iris
df_iris$Petal.Width = df_iris$Petal.Width %>% cut( breaks = c(0,0.5,1,1.5,2,Inf))
task = TaskClassif$new(id = "my_iris",
backend = df_iris,
target = "Species")
train_set = sample(task$nrow, 0.8 * task$nrow)
test_set = setdiff(seq_len(task$nrow), train_set)
task_train = TaskClassif$new(id = "my_iris",
backend = df_iris[train_set,], # use train_set
target = "Species")
graph = po("scale", param_vals = list(center = TRUE, scale = TRUE)) %>>%
po("encode", param_vals = list(method = "one-hot")) %>>%
mlr_pipeops$get("learner",
learner = mlr_learners$get("classif.rpart"))
graph$train(task_train)
graph$pipeops$encode$state$outtasklayout # inspect model input types
graph$pipeops$classif.rpart$predict_type = "prob"
task_test = TaskClassif$new(id = "my_iris_test",
backend = df_iris[test_set,], # use test_set
target = "Species")
pred = graph$predict(task_test)
pred$classif.rpart.output$prob
# when you don't have a target variable, just make up one
df_test2 <- df_iris[test_set,]
df_test2$Species = sample(df_iris$Species, length(test_set)) # made-up target
task_test2 = TaskClassif$new(id = "my_iris_test",
backend = df_test2, # use test_set
target = "Species")
pred2= graph$predict(task_test2)
pred2$classif.rpart.output$prob
正如@missuse 所建议的,通过使用 graph <- preprocess %>>% Learner
然后 graph_learner <- GraphLearner$new(graph)
命令,我可以预测 --- predict(TunedLearner, newdata = pred, predict_type="prob")
--- 使用原始 data.frame
。