tidymodels

Question

我尝试使用工作流集包或方法时出现错误。这是R代码（抱歉，代码很长）：

# Package ----
library(finetune)
library(themis)
library(tidymodels)

# Data ----
data("PimaIndiansDiabetes", package = "mlbench")

table(PimaIndiansDiabetes$diabetes)
str(PimaIndiansDiabetes)
PimaIndiansDiabetes <- 
  PimaIndiansDiabetes %>% 
  mutate(diabetes = relevel(diabetes, "pos"))

# Split ----
set.seed(123)
ind <- initial_split(PimaIndiansDiabetes, strata = diabetes)

dat_train <- training(ind)
dat_test <- testing(ind)

# CV ----
set.seed(123)
dat_cv <- vfold_cv(dat_train, v = 10)

# Recipe ----
dat_rec <- 
  dat_train %>% 
  recipe(diabetes ~.) %>% 
  step_normalize(all_numeric_predictors()) %>% 
  step_smote(diabetes)

# Model ----
parsnip_nn <- 
  mlp(hidden_units = tune(),
      penalty = tune(),
      epochs = tune()) %>% 
  set_mode("classification") %>% 
  set_engine("nnet")

parsnip_log <- 
  logistic_reg(penalty = tune(),
               mixture = tune()) %>% 
  set_engine("glmnet")

# Latin hypercube grid ----
latin_grid <- 
  grid_latin_hypercube(penalty(),
                       mixture(),
                       hidden_units(),
                       epochs(),
                       size = 30)

# Tuning ----
race_ctrl <-
  control_race(
    save_pred = T,
    save_workflow = T,
    verbose = T
  )

class_metrics <- metric_set(accuracy, 
                            f_meas, 
                            j_index, 
                            kap, 
                            precision, 
                            sensitivity, 
                            specificity, 
                            roc_auc, 
                            mcc, 
                            pr_auc)

Tuned_results <- 
  workflow_set(
    preproc = list(rec = dat_rec),
    models = list(parsnip_nn = parsnip_nn,
                  parsnip_log = parsnip_log)
  ) %>% 
  workflow_map(
    fn = "tune_race_anova", 
    seed = 123,
    grid = latin_grid,
    resamples = dat_cv,
    verbose = T,
    metrics = class_metrics,
    control = race_ctrl
  )

这是我得到的错误，基本上是说 tune() 无法识别模型的某些参数。

i 1 of 2 tuning: rec_parsnip_nn
x 1 of 2 tuning: rec_parsnip_nn failed with: Error in check_grid(grid = grid, workflow = workflow, pset = pset) : The provided `grid` has the following parameter columns that have not been marked for tuning by `tune()`: 'mixture'.
i 2 of 2 tuning: rec_parsnip_log
x 2 of 2 tuning: rec_parsnip_log failed with: Error in check_grid(grid = grid, workflow = workflow, pset = pset) : The provided `grid` has the following parameter columns that have not been marked for tuning by `tune()`: 'hidden_units', 'epochs'.

如果我们检查 grid_results:

# A workflow set/tibble: 2 x 4
  wflow_id        info             option    result        
  <chr>           <list>           <list>    <list>        
1 rec_parsnip_nn  <tibble [1 x 4]> <opts[4]> <try-errr [1]>
2 rec_parsnip_log <tibble [1 x 4]> <opts[4]> <try-errr [1]>

我不确定为什么 mixture、hidden_units 和 epochs 等参数不能被 tune() 识别。知道我哪里做错了吗？

Answer 1

神经网络没有名为 mixture 的参数，正则化回归模型也没有名为 hidden_units 或 epochs 的参数。您不能对两个模型使用相同的 grid 参数，因为它们没有相同的超参数。相反，您需要：

为两个模型创建单独的网格
使用 option_add() 通过 id 参数将每个网格添加到其模型中

另请查看 Chapter 15 of TMwR 以了解有关如何仅向特定工作流添加选项的更多信息。由于您使用的是拉丁 hybercube，这是 tidymodels 中的默认设置，您可能只想跳过所有这些并改用 grid = 30。

tidymodels - workflowsets 中的错误：提供的 `grid` 具有以下参数列，这些列尚未标记为由 `tune()` 调整

Error in tidymodels - workflowsets : The provided `grid` has the following parameter columns that have not been marked for tuning by `tune()`

r

machine-learning