predict.train vs 使用配方对象进行预测
predict.train vs predict using recipe objects
在指定要在 caret::train 中使用的配方后,我正在尝试预测新样本。我有几个关于这个的问题,因为我在 caret/recipes 文档中找不到。
- 我应该使用 predict() 还是 predict.train()?有什么区别?
- 在使用predict之前是否应该先用准备好的recipe烘焙测试数据?当直接在 train() 中使用 preProcess 时,建议您不要预处理新数据,因为 train 对象会自动执行此操作。使用食谱时也是这样吗?
下面是一个可重现的示例,说明了我的过程以及使用 predict 与 predict.train
时的预测差异
library(recipes)
library(caret)
# Data ----
data("credit_data")
credit_train <- credit_data[1:3500,]
credit_test <- credit_data[-(1:3500),]
# Set up recipe ----
set.seed(0)
Rec.Obj = recipe(Status ~ ., data = credit_train) %>%
step_knnimpute(all_predictors()) %>%
step_center(all_numeric())%>%
step_scale(all_numeric())
# Control parameters ----
set.seed(0)
TC = trainControl("cv",number = 10, savePredictions = "final", classProbs = TRUE, returnResamp = "final")
set.seed(0)
Model.Output = train(Rec.Obj,
credit_train,
trControl = TC,
tuneLength = 1,
metric = "Accuracy",
method = "glm")
# Preped recipe ----
set.seed(0)
prep.rec <-
prep(Rec.Obj, newdata = credit_train)
# Baked data for observation ----
set.seed(0)
bake.train <- bake(prep.rec, new_data = credit_train)
bake.test <- bake(prep.rec, new_data = credit_test)
# investigation of prediction methods ----
# no application of recipe to newdata
set.seed(0)
predict.norm = predict(Model.Output, credit_test, type = "raw")
predict.train = predict.train(Model.Output, credit_test, type = "raw")
identical(predict.norm,predict.train)
# evaluates to FALSE
# Apply recipe to new data (bake.test)
predict.norm.baked = predict(Model.Output, bake.test, type = "raw")
predict.train.baked = predict.train(Model.Output, bake.test, type = "raw")
identical(predict.norm.baked, predict.train.baked)
# evaluates to FALSE
# Comparison of both predict() funcs
identical(predict.norm, predict.norm.baked)
# evaluates to FALSE
配方嵌入到 train
对象中。答案不同有两个原因:
因为你给了配方(在Model.Output
内)处理过的数据要重新处理。您不应该提供 predict()
烘焙数据;只需使用 predict()
并为其提供原始测试集..
让 S3 做它的事情:predict.train
用于 x/y 接口,predict.train.recipe
用于配方接口。只需使用 predict()
即可完成相应的操作。
在指定要在 caret::train 中使用的配方后,我正在尝试预测新样本。我有几个关于这个的问题,因为我在 caret/recipes 文档中找不到。
- 我应该使用 predict() 还是 predict.train()?有什么区别?
- 在使用predict之前是否应该先用准备好的recipe烘焙测试数据?当直接在 train() 中使用 preProcess 时,建议您不要预处理新数据,因为 train 对象会自动执行此操作。使用食谱时也是这样吗?
下面是一个可重现的示例,说明了我的过程以及使用 predict 与 predict.train
时的预测差异library(recipes)
library(caret)
# Data ----
data("credit_data")
credit_train <- credit_data[1:3500,]
credit_test <- credit_data[-(1:3500),]
# Set up recipe ----
set.seed(0)
Rec.Obj = recipe(Status ~ ., data = credit_train) %>%
step_knnimpute(all_predictors()) %>%
step_center(all_numeric())%>%
step_scale(all_numeric())
# Control parameters ----
set.seed(0)
TC = trainControl("cv",number = 10, savePredictions = "final", classProbs = TRUE, returnResamp = "final")
set.seed(0)
Model.Output = train(Rec.Obj,
credit_train,
trControl = TC,
tuneLength = 1,
metric = "Accuracy",
method = "glm")
# Preped recipe ----
set.seed(0)
prep.rec <-
prep(Rec.Obj, newdata = credit_train)
# Baked data for observation ----
set.seed(0)
bake.train <- bake(prep.rec, new_data = credit_train)
bake.test <- bake(prep.rec, new_data = credit_test)
# investigation of prediction methods ----
# no application of recipe to newdata
set.seed(0)
predict.norm = predict(Model.Output, credit_test, type = "raw")
predict.train = predict.train(Model.Output, credit_test, type = "raw")
identical(predict.norm,predict.train)
# evaluates to FALSE
# Apply recipe to new data (bake.test)
predict.norm.baked = predict(Model.Output, bake.test, type = "raw")
predict.train.baked = predict.train(Model.Output, bake.test, type = "raw")
identical(predict.norm.baked, predict.train.baked)
# evaluates to FALSE
# Comparison of both predict() funcs
identical(predict.norm, predict.norm.baked)
# evaluates to FALSE
配方嵌入到 train
对象中。答案不同有两个原因:
因为你给了配方(在
Model.Output
内)处理过的数据要重新处理。您不应该提供predict()
烘焙数据;只需使用predict()
并为其提供原始测试集..让 S3 做它的事情:
predict.train
用于 x/y 接口,predict.train.recipe
用于配方接口。只需使用predict()
即可完成相应的操作。