Error: The first argument to [fit_resamples()] should be either a model or workflow

Error: The first argument to [fit_resamples()] should be either a model or workflow

问题:

我正在学习 Julia Silge (link here) 关于使用 tidymodels 和 recipes 的教程。我可以毫无问题地完成大部分工作,但是当我调用 fit_resamples() 函数时出现错误:Error: The first argument to [fit_resamples()] should be either a model or workflow.

我正在将教程中的代码复制到字符中,一切都运行良好,包括打印出来 validation_splits。但是,一旦我调用 fit_resamples(),我就会收到上面的错误 (link to relevant part of tutorial)。如果有用,rlang::last_error() 的输出是:

<error/rlang_error>

The first argument to [fit_resamples()] should be either a model or workflow.
Backtrace:
 
     1. tune::fit_resamples(...)
     2. tune:::fit_resamples.default(...)

有人知道这里发生了什么吗?我该如何解决?我的理解是,我传递给 fit_resamples() 的第一个参数是 一个模型,即 character ~ .,并且我在脚本没有问题。有关导致我的机器和我的 sessionInfo() 错误的代码(和数据),请参见下文。

可重现的例子:

library(tidyverse)

## Bring in data
hotels <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-02-11/hotels.csv')

hotel_stays <- hotels %>% 
  filter(is_canceled == 0) %>% 
  mutate(children = case_when(children + babies > 0 ~ 'children',
                              TRUE ~ 'none'),
         required_car_parking_spaces = case_when(required_car_parking_spaces > 0 ~ 'parking', 
                                                 TRUE ~ 'none')) %>% 
  select(-is_canceled, -reservation_status, -babies)

hotels_df <- hotel_stays %>% 
  select(children, hotel, arrival_date_month, meal, adr, adults, 
         required_car_parking_spaces, total_of_special_requests, 
         stays_in_week_nights, stays_in_weekend_nights) %>% 
  mutate_if(is.character, factor)

## Build models
library(tidymodels)

set.seed(1234)
hotel_split <- initial_split(hotels_df)
hotel_train <- training(hotel_split)
hotel_test <- testing(hotel_split)

hotel_rec <- recipe(children ~ ., data = hotel_train) %>% 
  step_downsample(children) %>% 
  step_dummy(all_nominal(), -all_outcomes()) %>% 
  step_zv(all_numeric()) %>% 
  step_normalize(all_numeric()) %>% 
  prep()

test_proc <- bake(hotel_rec, new_data = hotel_test)

knn_spec <- nearest_neighbor() %>% 
  set_engine('kknn') %>% 
  set_mode('classification')
knn_fit <- knn_spec %>% 
  fit(children ~ ., 
      data=juice(hotel_rec))
knn_fit

## Evaluate models
set.seed(1234)
validation_splits <- mc_cv(juice(hotel_rec), prop = 0.9, strata = children)
validation_splits

## This is where I get the error
knn_res <- fit_resamples(
  children ~ ., 
  knn_spec,
  validation_splits,
  control = control_resamples(save_pred = TRUE)
)

我的sessionInfo():

> sessionInfo()
R version 4.1.0 (2021-05-18)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS High Sierra 10.13.6

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib

Random number generation:
 RNG:     Mersenne-Twister 
 Normal:  Inversion 
 Sample:  Rounding 
 
locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] GGally_2.1.2.9000  skimr_2.1.3        silgelib_0.1.1     forcats_0.5.1     
 [5] stringr_1.4.0      readr_1.4.0        tidyverse_1.3.1    knitr_1.33        
 [9] yardstick_0.0.8    workflowsets_0.0.2 workflows_0.2.2    tune_0.1.5        
[13] tidyr_1.1.3        tibble_3.1.2       rsample_0.1.0      recipes_0.1.16    
[17] purrr_0.3.4        parsnip_0.1.6      modeldata_0.1.0    infer_0.5.4       
[21] ggplot2_3.3.5      dplyr_1.0.7        dials_0.0.9        scales_1.1.1      
[25] broom_0.7.6        tidymodels_0.1.3  

loaded via a namespace (and not attached):
 [1] colorspace_2.0-1   ellipsis_0.3.2     class_7.3-19       base64enc_0.1-3   
 [5] fs_1.5.0           rstudioapi_0.13    listenv_0.8.0      furrr_0.2.3       
 [9] farver_2.1.0       prodlim_2019.11.13 fansi_0.5.0        lubridate_1.7.10  
[13] xml2_1.3.2         codetools_0.2-18   splines_4.1.0      jsonlite_1.7.2    
[17] pROC_1.17.0.1      dbplyr_2.1.1       shiny_1.6.0        compiler_4.1.0    
[21] httr_1.4.2         backports_1.2.1    assertthat_0.2.1   Matrix_1.3-3      
[25] fastmap_1.1.0      cli_2.5.0          later_1.2.0        htmltools_0.5.1.1 
[29] prettyunits_1.1.1  tools_4.1.0        igraph_1.2.6       gtable_0.3.0      
[33] glue_1.4.2         Rcpp_1.0.6         cellranger_1.1.0   DiceDesign_1.9    
[37] vctrs_0.3.8        iterators_1.0.13   timeDate_3043.102  gower_0.2.2       
[41] xfun_0.23          globals_0.14.0     rvest_1.0.0        mime_0.10         
[45] lifecycle_1.0.0    kknn_1.3.1         future_1.21.0      MASS_7.3-54       
[49] ipred_0.9-11       hms_1.1.0          promises_1.2.0.1   parallel_4.1.0    
[53] RColorBrewer_1.1-2 yaml_2.2.1         curl_4.3.1         rpart_4.1-15      
[57] reshape_0.8.8      stringi_1.6.2      foreach_1.5.1      lhs_1.1.1         
[61] lava_1.6.9         repr_1.1.3         rlang_0.4.11       pkgconfig_2.0.3   
[65] evaluate_0.14      lattice_0.20-44    htmlwidgets_1.5.3  labeling_0.4.2    
[69] tidyselect_1.1.1   parallelly_1.26.0  plyr_1.8.6         magrittr_2.0.1    
[73] R6_2.5.0           generics_0.1.0     DBI_1.1.1          pillar_1.6.1      
[77] haven_2.4.1        withr_2.4.2        survival_3.2-11    nnet_7.3-16       
[81] modelr_0.1.8       crayon_1.4.1       utf8_1.2.1         rmarkdown_2.8     
[85] progress_1.2.2     grid_4.1.0         readxl_1.3.1       reprex_2.0.0      
[89] digest_0.6.27      xtable_1.8-4       httpuv_1.6.1       GPfit_1.0-8       
[93] munsell_0.5.0 

您正在查看的博客 post 相当古老,并且有一个 change to tune a while back 因此您现在应该将工作流或模型放在首位。因此错误消息:

The first argument to [fit_resamples()] should be either a model or workflow.

解决方法是将您的模型或工作流作为第一个参数,如下所示:

knn_res <- fit_resamples(
  knn_spec,
  children ~ ., 
  validation_splits,
  control = control_resamples(save_pred = TRUE)
)