来自 R 包 tidymodels 的 tune_grid 函数出错
Error with tune_grid function from R package tidymodels
我一直在从 Julia Silge 的 Youtube 情感分析视频中复制代码,使用 tidymodels 进行动物穿越用户评论 (https://www.youtube.com/watch?v=whE85O1XCkg&t=1300s)。在第 25 分钟,她使用 tune_grid(),当我尝试在我的脚本中使用它时,我收到了 warning/error: 警告消息:
所有模型都在 tune_grid() 中失败。请参阅 .notes
列。
在 .notes 中,出现 25 次:
[[1]]
# A tibble: 1 x 1
.notes
<chr>
1 "recipe: Error in UseMethod(\"prep\"): no applicable method for 'prep' applied~
我该如何解决这个问题?我使用的代码与 Julia 使用的代码相同。我的整个代码是这样的:
library(tidyverse)
user_reviews <- read_tsv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-05-05/user_reviews.tsv")
user_reviews %>%
count(grade) %>%
ggplot(aes(grade,n)) +
geom_col()
user_reviews %>%
filter(grade > 0) %>%
sample_n(5) %>%
pull(text)
reviews_parsed <- user_reviews %>%
mutate(text = str_remove(text, "Expand"),
rating = case_when(grade > 6 ~ "Good", TRUE ~ "Bad"))
library(tidytext)
words_per_review <- reviews_parsed %>%
unnest_tokens(word,text) %>%
count(user_name, name = "total_words", sort = TRUE)
words_per_review %>%
ggplot(aes(total_words)) +
geom_histogram()
library(tidymodels)
set.seed(123)
review_split <- initial_split(reviews_parsed, strata = rating)
review_train <- training(review_split)
review_test <- testing(review_split)
library(textrecipes)
review_rec <- recipe(rating ~ text, data = review_train) %>%
step_tokenize(text) %>%
step_stopwords(text) %>%
step_tokenfilter(text, max_tokens = 500) %>%
step_tfidf(text) %>%
step_normalize(all_predictors())
review_prep <- prep(review_rec)
review_prep
juice(review_prep)
lasso_spec <- logistic_reg(penalty = tune(), mixture = 1) %>%
set_engine("glmnet")
lasso_wf <- workflow() %>%
add_recipe(review_rec) %>%
add_model(lasso_spec)
lasso_wf
lambda_grid <- grid_regular(penalty(), levels = 30)
set.seed(123)
review_folds <- bootstraps(review_train, strata = rating)
review_folds
doParallel::registerDoParallel()
set.seed(2020)
lasso_grid <- tune_grid(lasso_wf, resamples = review_folds, grid = lambda_grid, metrics = metric_set(roc_auc, ppv, npv))
lasso_grid
Warning message:
All models failed in tune_grid(). See the `.notes` column.
lasso_grid$.notes
[[1]]
# A tibble: 1 x 1
.notes
<chr>
1 "recipe: Error in UseMethod(\"prep\"): no applicable method for 'prep' applied~
[[2]]
# A tibble: 1 x 1
.notes
<chr>
1 "recipe: Error in UseMethod(\"prep\"): no applicable method for 'prep' applied~
[[3]]
# A tibble: 1 x 1
.notes
<chr>
1 "recipe: Error in UseMethod(\"prep\"): no applicable method for 'prep' applied~
etc... to 25.
在 post 的评论部分找到了解决方案。这对我(Windows 用户)有用,并使网格调整快了近 4 倍。
all_cores <- parallel::detectCores(logical = FALSE)
library(doParallel)
cl <- makePSOCKcluster(all_cores)
registerDoParallel(cl)
set.seed(2020)
lasso_grid <- tune_grid(
lasso_wf,
resamples = review_folds,
grid = lambda_grid,
metrics = metric_set(roc_auc, ppv, npv),
control = control_grid(pkgs = c('textrecipes'))
)
我一直在从 Julia Silge 的 Youtube 情感分析视频中复制代码,使用 tidymodels 进行动物穿越用户评论 (https://www.youtube.com/watch?v=whE85O1XCkg&t=1300s)。在第 25 分钟,她使用 tune_grid(),当我尝试在我的脚本中使用它时,我收到了 warning/error: 警告消息:
所有模型都在 tune_grid() 中失败。请参阅 .notes
列。
在 .notes 中,出现 25 次:
[[1]]
# A tibble: 1 x 1
.notes
<chr>
1 "recipe: Error in UseMethod(\"prep\"): no applicable method for 'prep' applied~
我该如何解决这个问题?我使用的代码与 Julia 使用的代码相同。我的整个代码是这样的:
library(tidyverse)
user_reviews <- read_tsv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-05-05/user_reviews.tsv")
user_reviews %>%
count(grade) %>%
ggplot(aes(grade,n)) +
geom_col()
user_reviews %>%
filter(grade > 0) %>%
sample_n(5) %>%
pull(text)
reviews_parsed <- user_reviews %>%
mutate(text = str_remove(text, "Expand"),
rating = case_when(grade > 6 ~ "Good", TRUE ~ "Bad"))
library(tidytext)
words_per_review <- reviews_parsed %>%
unnest_tokens(word,text) %>%
count(user_name, name = "total_words", sort = TRUE)
words_per_review %>%
ggplot(aes(total_words)) +
geom_histogram()
library(tidymodels)
set.seed(123)
review_split <- initial_split(reviews_parsed, strata = rating)
review_train <- training(review_split)
review_test <- testing(review_split)
library(textrecipes)
review_rec <- recipe(rating ~ text, data = review_train) %>%
step_tokenize(text) %>%
step_stopwords(text) %>%
step_tokenfilter(text, max_tokens = 500) %>%
step_tfidf(text) %>%
step_normalize(all_predictors())
review_prep <- prep(review_rec)
review_prep
juice(review_prep)
lasso_spec <- logistic_reg(penalty = tune(), mixture = 1) %>%
set_engine("glmnet")
lasso_wf <- workflow() %>%
add_recipe(review_rec) %>%
add_model(lasso_spec)
lasso_wf
lambda_grid <- grid_regular(penalty(), levels = 30)
set.seed(123)
review_folds <- bootstraps(review_train, strata = rating)
review_folds
doParallel::registerDoParallel()
set.seed(2020)
lasso_grid <- tune_grid(lasso_wf, resamples = review_folds, grid = lambda_grid, metrics = metric_set(roc_auc, ppv, npv))
lasso_grid
Warning message:
All models failed in tune_grid(). See the `.notes` column.
lasso_grid$.notes
[[1]]
# A tibble: 1 x 1
.notes
<chr>
1 "recipe: Error in UseMethod(\"prep\"): no applicable method for 'prep' applied~
[[2]]
# A tibble: 1 x 1
.notes
<chr>
1 "recipe: Error in UseMethod(\"prep\"): no applicable method for 'prep' applied~
[[3]]
# A tibble: 1 x 1
.notes
<chr>
1 "recipe: Error in UseMethod(\"prep\"): no applicable method for 'prep' applied~
etc... to 25.
在 post 的评论部分找到了解决方案。这对我(Windows 用户)有用,并使网格调整快了近 4 倍。
all_cores <- parallel::detectCores(logical = FALSE)
library(doParallel)
cl <- makePSOCKcluster(all_cores)
registerDoParallel(cl)
set.seed(2020)
lasso_grid <- tune_grid(
lasso_wf,
resamples = review_folds,
grid = lambda_grid,
metrics = metric_set(roc_auc, ppv, npv),
control = control_grid(pkgs = c('textrecipes'))
)