食谱包无法在 step_interact 中创建交互项

recipes package cannot create interaction term in step_interact

我正在使用医疗保险数据集来磨练我的建模技能,如下所示:

> insur_dt
      age    sex    bmi children smoker    region   charges
   1:  19 female 27.900        0    yes southwest 16884.924
   2:  18   male 33.770        1     no southeast  1725.552
   3:  28   male 33.000        3     no southeast  4449.462
   4:  33   male 22.705        0     no northwest 21984.471
   5:  32   male 28.880        0     no northwest  3866.855
  ---                                                      
1334:  50   male 30.970        3     no northwest 10600.548
1335:  18 female 31.920        0     no northeast  2205.981
1336:  18 female 36.850        0     no southeast  1629.833
1337:  21 female 25.800        0     no southwest  2007.945
1338:  61 female 29.070        0    yes northwest 29141.360

我正在使用 recipes 作为 tidymodels 元包的一部分来准备我的数据以用于模型,并且我确定 bmiage, 和 smoker 形成交互项。

insur_split <- initial_split(insur_dt)

insur_train <- training(insur_split)
insur_test <- testing(insur_split)

# we are going to do data processing and feature engineering with recipes

# below, we are going to predict charges using everything else(".")
insur_rec <- recipe(charges ~ age + bmi + smoker, data = insur_train) %>%
    step_dummy(all_nominal()) %>%
    step_zv(all_numeric()) %>%
    step_normalize(all_numeric()) %>%
    step_interact(~ bmi:smoker:age) %>% 
    prep()

根据 tidymodels guide/documentation,我必须将交互指定为 recipe 中的一个步骤 step_interact。但是,当我尝试这样做时出现错误:

> insur_rec <- recipe(charges ~ age + bmi + smoker, data = insur_train) %>%
+     step_dummy(all_nominal()) %>%
+     step_zv(all_numeric()) %>%
+     step_normalize(all_numeric()) %>%
+     step_interact(~ bmi:smoker:age) %>% 
+     prep()
Interaction specification failed for: ~bmi:smoker:age. No interactions will be created.partial match of 'object' to 'objects'

我是建模新手,不太清楚为什么会出现此错误。我只是想说明 charges 由所有其他预测变量解释,并且 smoker(yes/no 因子)、age(数字)和 bmi (double) 都相互交互以告知结果。我做错了什么?

来自documentation:

step_interact can create interactions between variables. It is primarily intended for numeric data; categorical variables should probably be converted to dummy variables using step_dummy() prior to being used for interactions.

step_dummy(all_nominal())把变量smoker变成了smoker_yes。下面,您会看到我刚刚将交互项中的 smoker 名称更改为 smoker_yes.

insur_rec <- recipe(charges ~ bmi + age + smoker, data = insur_train) %>%
    step_dummy(all_nominal()) %>%
    step_normalize(all_numeric(), -all_outcomes()) %>%
    step_interact(terms = ~ bmi:age:smoker_yes) %>% 
    prep(verbose = TRUE, log_changes = TRUE)