Tidymodels 调整配方参数
Tidymodels Tuning Recipe Parameters
使用 tidymodels,我真的很喜欢不仅可以调整模型参数,还可以调整一些食谱步骤的可能性。例如 step_pls() 中的组件数。问题是我在限制可能值的范围方面遇到了麻烦。例如,如果我想使用 step_umap,我想将搜索 space 限制为 2:5 组件。当我用 step_umap() 替换 step_pls() 时,以下代码导致会话崩溃。它试图用大约 50 个组件构建 umap...
所以基本上,我的问题是,在使用 grid_random 或 grid_max_entropy 时,如何限制特定调整参数的搜索范围?
注意:还尝试了类似 param_grid%>%grid_random(size=5,num_comp() %>% range_set(c(3, 5)))
的方法。但是好像被忽略了。
谢谢
# Load Packages -----------------------------------------------------------
require(tidyverse)
require(lubridate)
require(tidymodels)
require(rsample)
require(themis)
require(recipes)
require(embed)
# Load Data ---------------------------------------------------------------
data<-read_csv("....data.csv")
# Modelling - Data Partition ----------------------------------------------
split_prop <- 0.80
init_split <- initial_time_split(data, prop = split_prop)
set_train<-training(init_split)
set_test<-testing(init_split)
# Modelling - Resamples ---------------------------------------------------
valid_folds <- rsample::vfold_cv(set_train,v=5)
# Modelling - Data Transf -------------------------------------------------
recip_train <- recipe(label ~ .,
data = set_train)%>%
step_normalize(all_predictors())%>%
step_pls(all_predictors(),outcome = "label",num_comp = tune())
# Modelling - Model Specs ---------------------------------------------------
model_glm <- linear_reg()%>%
set_args(penalty=tune(),
mixture=tune())%>%
set_mode("regression") %>%
set_engine("glmnet")
# Workflow ------------------------------------------------------------------
wflw <- workflow() %>%
add_recipe(recip_train) %>%
add_model(model_glm)
# Modelling - Tuning Control -------------------------------------------------
ctr_tune <- control_grid(
verbose = TRUE,
allow_par = TRUE,
extract = NULL,
save_pred = TRUE,
pkgs = NULL
)
param_grid<-wflw %>%
parameters()%>%
finalize(set_train)%>%
grid_max_entropy(size = 5)
# Modelling - Tuning ---------------------------------------------------------
tuning <- tune_grid(object = wflw,
resamples = valid_folds,
grid = param_grid,
control = ctr_tune,
metrics = metric_set(rmse))
如果您想尝试 num_comp
的特定范围,我不会费心进入工作流程并获取参数等。我会设置调整网格直接参数:
library(dials)
#> Loading required package: scales
grid_max_entropy(penalty(),
mixture(),
num_comp(range = c(2, 5)),
size = 5)
#> # A tibble: 5 x 3
#> penalty mixture num_comp
#> <dbl> <dbl> <int>
#> 1 0.00161 0.721 5
#> 2 0.751 0.376 4
#> 3 0.00000000974 0.395 3
#> 4 0.000107 0.0747 4
#> 5 0.0000000451 0.906 3
由 reprex package (v0.3.0)
于 2020-07-19 创建
使用 tidymodels,我真的很喜欢不仅可以调整模型参数,还可以调整一些食谱步骤的可能性。例如 step_pls() 中的组件数。问题是我在限制可能值的范围方面遇到了麻烦。例如,如果我想使用 step_umap,我想将搜索 space 限制为 2:5 组件。当我用 step_umap() 替换 step_pls() 时,以下代码导致会话崩溃。它试图用大约 50 个组件构建 umap... 所以基本上,我的问题是,在使用 grid_random 或 grid_max_entropy 时,如何限制特定调整参数的搜索范围?
注意:还尝试了类似 param_grid%>%grid_random(size=5,num_comp() %>% range_set(c(3, 5)))
的方法。但是好像被忽略了。
谢谢
# Load Packages -----------------------------------------------------------
require(tidyverse)
require(lubridate)
require(tidymodels)
require(rsample)
require(themis)
require(recipes)
require(embed)
# Load Data ---------------------------------------------------------------
data<-read_csv("....data.csv")
# Modelling - Data Partition ----------------------------------------------
split_prop <- 0.80
init_split <- initial_time_split(data, prop = split_prop)
set_train<-training(init_split)
set_test<-testing(init_split)
# Modelling - Resamples ---------------------------------------------------
valid_folds <- rsample::vfold_cv(set_train,v=5)
# Modelling - Data Transf -------------------------------------------------
recip_train <- recipe(label ~ .,
data = set_train)%>%
step_normalize(all_predictors())%>%
step_pls(all_predictors(),outcome = "label",num_comp = tune())
# Modelling - Model Specs ---------------------------------------------------
model_glm <- linear_reg()%>%
set_args(penalty=tune(),
mixture=tune())%>%
set_mode("regression") %>%
set_engine("glmnet")
# Workflow ------------------------------------------------------------------
wflw <- workflow() %>%
add_recipe(recip_train) %>%
add_model(model_glm)
# Modelling - Tuning Control -------------------------------------------------
ctr_tune <- control_grid(
verbose = TRUE,
allow_par = TRUE,
extract = NULL,
save_pred = TRUE,
pkgs = NULL
)
param_grid<-wflw %>%
parameters()%>%
finalize(set_train)%>%
grid_max_entropy(size = 5)
# Modelling - Tuning ---------------------------------------------------------
tuning <- tune_grid(object = wflw,
resamples = valid_folds,
grid = param_grid,
control = ctr_tune,
metrics = metric_set(rmse))
如果您想尝试 num_comp
的特定范围,我不会费心进入工作流程并获取参数等。我会设置调整网格直接参数:
library(dials)
#> Loading required package: scales
grid_max_entropy(penalty(),
mixture(),
num_comp(range = c(2, 5)),
size = 5)
#> # A tibble: 5 x 3
#> penalty mixture num_comp
#> <dbl> <dbl> <int>
#> 1 0.00161 0.721 5
#> 2 0.751 0.376 4
#> 3 0.00000000974 0.395 3
#> 4 0.000107 0.0747 4
#> 5 0.0000000451 0.906 3
由 reprex package (v0.3.0)
于 2020-07-19 创建