recipes::step_dummy + caret::train -> Error:Not all variables in the recipe are present
recipes::step_dummy + caret::train -> Error:Not all variables in the recipe are present
我在将 recipes::step_dummy 与 caret::train 一起使用时出现以下错误(第一次尝试合并这两个包):
Error: Not all variables in the recipe are present in the supplied
training set
不确定导致错误的原因,也不确定最佳调试方法。帮助训练模型将不胜感激。
library(caret)
library(tidyverse)
library(recipes)
library(rsample)
data("credit_data")
## Split the data into training (75%) and test sets (25%)
set.seed(100)
train_test_split <- initial_split(credit_data)
credit_train <- training(train_test_split)
credit_test <- testing(train_test_split)
# Create recipe for data pre-processing
rec_obj <- recipe(Status ~ ., data = credit_train) %>%
step_knnimpute(all_predictors()) %>%
#step_other(Home, Marital, threshold = .2, other = "other") %>%
#step_other(Job, threshold = .2, other = "others") %>%
step_dummy(Records) %>%
step_center(all_numeric()) %>%
step_scale(all_numeric()) %>%
prep(training = credit_train, retain = TRUE)
train_data <- juice(rec_obj)
test_data <- bake(rec_obj, credit_test)
set.seed(1055)
# the glm function models the second factor level.
lrfit <- train(rec_obj, data = train_data,
method = "glm",
trControl = trainControl(method = "repeatedcv",
repeats = 5))
看来你还需要 train 函数中的公式(尽管在配方中列出)?...
glmfit <- train(Status ~ ., data = juice(rec_obj),
method = "glm",
trControl = trainControl(method = "repeatedcv", repeats = 5))
在将食谱交给 train
之前不要准备食谱并使用原始训练集:
library(caret)
#> Loading required package: lattice
#> Loading required package: ggplot2
library(tidyverse)
library(recipes)
#>
#> Attaching package: 'recipes'
#> The following object is masked from 'package:stringr':
#>
#> fixed
#> The following object is masked from 'package:stats':
#>
#> step
library(rsample)
data("credit_data")
## Split the data into training (75%) and test sets (25%)
set.seed(100)
train_test_split <- initial_split(credit_data)
credit_train <- training(train_test_split)
credit_test <- testing(train_test_split)
# Create recipe for data pre-processing
rec_obj <-
recipe(Status ~ ., data = credit_train) %>%
step_knnimpute(all_predictors()) %>%
#step_other(Home, Marital, threshold = .2, other = "other") %>%
#step_other(Job, threshold = .2, other = "others") %>%
step_dummy(Records) %>%
step_center(all_numeric()) %>%
step_scale(all_numeric())
set.seed(1055)
# the glm function models the second factor level.
lrfit <- train(rec_obj, data = credit_train,
method = "glm",
trControl = trainControl(method = "repeatedcv",
repeats = 5))
lrfit
#> Generalized Linear Model
#>
#> 3341 samples
#> 13 predictor
#> 2 classes: 'bad', 'good'
#>
#> Recipe steps: knnimpute, dummy, center, scale
#> Resampling: Cross-Validated (10 fold, repeated 5 times)
#> Summary of sample sizes: 3006, 3008, 3007, 3007, 3007, 3007, ...
#> Resampling results:
#>
#> Accuracy Kappa
#> 0.7965349 0.4546223
由 reprex package (v0.2.1)
于 2019-03-20 创建
我在将 recipes::step_dummy 与 caret::train 一起使用时出现以下错误(第一次尝试合并这两个包):
Error: Not all variables in the recipe are present in the supplied training set
不确定导致错误的原因,也不确定最佳调试方法。帮助训练模型将不胜感激。
library(caret)
library(tidyverse)
library(recipes)
library(rsample)
data("credit_data")
## Split the data into training (75%) and test sets (25%)
set.seed(100)
train_test_split <- initial_split(credit_data)
credit_train <- training(train_test_split)
credit_test <- testing(train_test_split)
# Create recipe for data pre-processing
rec_obj <- recipe(Status ~ ., data = credit_train) %>%
step_knnimpute(all_predictors()) %>%
#step_other(Home, Marital, threshold = .2, other = "other") %>%
#step_other(Job, threshold = .2, other = "others") %>%
step_dummy(Records) %>%
step_center(all_numeric()) %>%
step_scale(all_numeric()) %>%
prep(training = credit_train, retain = TRUE)
train_data <- juice(rec_obj)
test_data <- bake(rec_obj, credit_test)
set.seed(1055)
# the glm function models the second factor level.
lrfit <- train(rec_obj, data = train_data,
method = "glm",
trControl = trainControl(method = "repeatedcv",
repeats = 5))
看来你还需要 train 函数中的公式(尽管在配方中列出)?...
glmfit <- train(Status ~ ., data = juice(rec_obj),
method = "glm",
trControl = trainControl(method = "repeatedcv", repeats = 5))
在将食谱交给 train
之前不要准备食谱并使用原始训练集:
library(caret)
#> Loading required package: lattice
#> Loading required package: ggplot2
library(tidyverse)
library(recipes)
#>
#> Attaching package: 'recipes'
#> The following object is masked from 'package:stringr':
#>
#> fixed
#> The following object is masked from 'package:stats':
#>
#> step
library(rsample)
data("credit_data")
## Split the data into training (75%) and test sets (25%)
set.seed(100)
train_test_split <- initial_split(credit_data)
credit_train <- training(train_test_split)
credit_test <- testing(train_test_split)
# Create recipe for data pre-processing
rec_obj <-
recipe(Status ~ ., data = credit_train) %>%
step_knnimpute(all_predictors()) %>%
#step_other(Home, Marital, threshold = .2, other = "other") %>%
#step_other(Job, threshold = .2, other = "others") %>%
step_dummy(Records) %>%
step_center(all_numeric()) %>%
step_scale(all_numeric())
set.seed(1055)
# the glm function models the second factor level.
lrfit <- train(rec_obj, data = credit_train,
method = "glm",
trControl = trainControl(method = "repeatedcv",
repeats = 5))
lrfit
#> Generalized Linear Model
#>
#> 3341 samples
#> 13 predictor
#> 2 classes: 'bad', 'good'
#>
#> Recipe steps: knnimpute, dummy, center, scale
#> Resampling: Cross-Validated (10 fold, repeated 5 times)
#> Summary of sample sizes: 3006, 3008, 3007, 3007, 3007, 3007, ...
#> Resampling results:
#>
#> Accuracy Kappa
#> 0.7965349 0.4546223
由 reprex package (v0.2.1)
于 2019-03-20 创建