为什么我们需要在 tidymodel 中进行准备、烘烤和榨汁？

Question

我总是在不使用 prep()、bake() 或 juice():

的情况下完成我的模型以进行拟合和预测

rec_wflow <- 
  workflow() %>% 
  add_model(lr_mod) %>% 
  add_recipe(rec)

data_fit <- 
  rec_wflow %>% 
  fit(data = train_data)

这些（prep、bake、juice）函数是否只是用来直观地查看数据预处理结果，而不是fitting/training过程所必需的？

上面的代码是我在官方教程中学习的。

我在另一个博客上看到，如果您使用 train_data，就会产生数据泄漏。我想听听更多相关信息；这些功能与数据泄露有关吗？

Answer 1

简短回答：您是正确的，当在您的示例中的工作流程中使用配方时，不需要 pre-processing 函数。

这在教程中有涉及Handle class imbalance in #TidyTuesday climbing expedition data with tidymodels:

We’re going to use this recipe in a workflow() so we don’t need to stress a lot about whether to prep() or not. If you want to explore the what the recipe is doing to your data, you can first prep() the recipe to estimate the parameters needed for each step and then bake(new_data = NULL) to pull out the training data with those steps applied.

我推荐 Julia 博客上的所有教程以了解 tidymodels。

为什么我们需要在 tidymodel 中进行准备、烘烤和榨汁？

Why do we need prep, bake, and juice in tidymodels?

r

r-recipes

tidymodels