Recipe for XGBoost tidymodels. Error: unused argument (values)

Recipe for XGBoost tidymodels. Error: unused argument (values)

目前,我正在使用拉丁超立方体采样策略对时间序列上的 XGBoost 回归进行超参数调优实验。当 运行 下面的代码时,所有模型在 tune_grid 操作期间失败。原因似乎是配方对象。我使用 step_dummy() 转换单变量时间序列的值列在 .notes 对象中出现错误消息:预处理器 1/1:错误:未使用的参数(值)

我发现了其他 post 出现此问题的地方,但 none 的解决方案对我的情况有所帮助。

suppressPackageStartupMessages(library(tidyverse))
suppressPackageStartupMessages(library(lubridate))
library(timetk)
library(tidymodels)
library(modeltime)
library(tictoc)


dates <- ymd("2016-01-01")+ months(0:59)
fake_values <- 
  c(64,61, 90,138,240,141,123, 9,180,95,84,69,76,104,122,183,200,268,225,
    132,84,159,64,131,98,138,179,187,303,257,175,133,145,36,3,134,137,308,
    84,114,310,266,123,131,87,94,86,100,105,147,159,232,312,337,285,188,257,10,98,27
  )
df <- bind_cols(fake_values, dates) %>% 
  rename(c(values = ...1, dates = ...2)
  )

# training- and test set
data_splits <- initial_time_split(df, prop = 0.8)
data_train  <- training(data_splits)
data_test   <- testing(data_splits)

resampling_strategy <- 
  data_train %>%
  time_series_cv(
    initial = "12 months",
    assess = "3 months",
    skip = "3 months",
    cumulative  = TRUE,
    slice_limit = 3
)

# recipe
basic_rec <- recipe(values ~ ., data = data_train)  %>% 
  step_dummy(all_nominal(values), -all_outcomes()) 

basic_rec %>% prep()

看起来问题在于那些日期预测变量没有转换为 xgboost 需要的数值。您确实使用了 step_dummy(),但日期不是 factor/nominal 变量,因此 all_nominal() 没有选择它们。如果您明确选择它们,则会发生以下情况:

library(tidymodels)
#> Registered S3 method overwritten by 'tune':
#>   method                   from   
#>   required_pkgs.model_spec parsnip
library(lubridate)
#> 
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:base':
#> 
#>     date, intersect, setdiff, union

dates <- ymd("2016-01-01") + months(0:59)
fake_values <- 
  c(64,61, 90,138,240,141,123, 9,180,95,84,69,76,104,122,183,200,268,225,
    132,84,159,64,131,98,138,179,187,303,257,175,133,145,36,3,134,137,308,
    84,114,310,266,123,131,87,94,86,100,105,147,159,232,312,337,285,188,257,10,98,27
  )
df <- bind_cols(fake_values, dates) %>% 
  rename(c(values = ...1, dates = ...2)
  )
#> New names:
#> * NA -> ...1
#> * NA -> ...2

# training- and test set
data_splits <- initial_time_split(df, prop = 0.8)
data_train  <- training(data_splits)
data_test   <- testing(data_splits)

basic_rec <- recipe(values ~ ., data = data_train) %>% 
  step_dummy(dates) 

basic_rec %>% prep() %>% bake(new_data = NULL)
#> Warning: The following variables are not factor vectors and will be ignored:
#> `dates`
#> Error: The `terms` argument in `step_dummy` did not select any factor columns.

reprex package (v2.0.1)

于 2021-10-27 创建

您可能想用 step_date().

之类的方式处理日期