当有一个角色作为 ID 的变量时,为什么部署带有香根草的 tidymodel 会抛出错误?
Why does deploying a tidymodel with vetiver throw a error when there's a variable with role as ID?
我无法部署带有香根草的 tidymodel,并且无法在模型包含一个变量并将角色作为配方中的 ID 时得到预测。看到图中出现以下错误:
{
"error": "500 - 内部服务器错误",
“消息”:“错误:缺少以下必需的列:'Fake_ID'。\n”
}
虚拟示例的代码如下。
我是否需要从模型和配方中删除 ID 变量才能使管道工 API 工作?
#Load libraries
library(recipes)
library(parsnip)
library(workflows)
library(pins)
library(plumber)
library(stringi)
#Upload data
data(Sacramento, package = "modeldata")
#Create fake IDs for testing
Sacramento$Fake_ID <- stri_rand_strings(nrow(Sacramento), 10)
# Train model
Sacramento_recipe <- recipe(formula = price ~ type + sqft + beds + baths + zip + Fake_ID, data = Sacramento) %>%
update_role(Fake_ID, new_role = "ID") %>%
step_zv(all_predictors())
rf_spec <- rand_forest(mode = "regression") %>% set_engine("ranger")
rf_fit <-
workflow() %>%
add_model(rf_spec) %>%
add_recipe(Sacramento_recipe) %>%
fit(Sacramento)
# Create vetiver object
v <- vetiver::vetiver_model(rf_fit, "sacramento_rf")
v
# Allow for model versioning and sharing
model_board <- board_temp()
model_board %>% vetiver_pin_write(v)
# Deploying model
pr() %>%
vetiver_api(v) %>%
pr_run(port = 8088)
Running the example of the Plumber API
从今天开始,香根草寻找“霉菌”workflows::extract_mold(rf_fit)
并且只得到预测因子来创建 ptype。但是,当您从工作流中进行预测时,它确实需要所有变量,包括 non-predictors。如果您使用 non-predictors 训练了一个模型,从今天开始,您可以通过传入自定义 ptype
:
来使 API 工作
library(recipes)
#> Loading required package: dplyr
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
#>
#> Attaching package: 'recipes'
#> The following object is masked from 'package:stats':
#>
#> step
library(parsnip)
library(workflows)
library(pins)
library(plumber)
library(stringi)
data(Sacramento, package = "modeldata")
Sacramento$Fake_ID <- stri_rand_strings(nrow(Sacramento), 10)
Sacramento_recipe <-
recipe(formula = price ~ type + sqft + beds + baths + zip + Fake_ID,
data = Sacramento) %>%
update_role(Fake_ID, new_role = "ID") %>%
step_zv(all_predictors())
rf_spec <- rand_forest(mode = "regression") %>% set_engine("ranger")
rf_fit <-
workflow() %>%
add_model(rf_spec) %>%
add_recipe(Sacramento_recipe) %>%
fit(Sacramento)
library(vetiver)
## this is probably easiest because this model uses a simple formula
## if there is more complex preprocessing, select the variables
## from `Sacramento` via dplyr or similar
sac_ptype <- extract_recipe(rf_fit) %>%
bake(new_data = Sacramento, -all_outcomes()) %>%
vctrs::vec_ptype()
v <- vetiver_model(rf_fit, "sacramento_rf", save_ptype = sac_ptype)
v
#>
#> ── sacramento_rf ─ <butchered_workflow> model for deployment
#> A ranger regression modeling workflow using 6 features
pr() %>%
vetiver_api(v)
#> # Plumber router with 2 endpoints, 4 filters, and 0 sub-routers.
#> # Use `pr_run()` on this object to start the API.
#> ├──[queryString]
#> ├──[body]
#> ├──[cookieParser]
#> ├──[sharedSecret]
#> ├──/ping (GET)
#> └──/predict (POST)
由 reprex package (v2.0.1)
创建于 2022-03-10
您是否正在使用 non-predictor 变量训练生产模型?您介意 opening an issue on GitHub 多解释一下您的用例吗?
我无法部署带有香根草的 tidymodel,并且无法在模型包含一个变量并将角色作为配方中的 ID 时得到预测。看到图中出现以下错误:
{ "error": "500 - 内部服务器错误", “消息”:“错误:缺少以下必需的列:'Fake_ID'。\n” }
虚拟示例的代码如下。 我是否需要从模型和配方中删除 ID 变量才能使管道工 API 工作?
#Load libraries
library(recipes)
library(parsnip)
library(workflows)
library(pins)
library(plumber)
library(stringi)
#Upload data
data(Sacramento, package = "modeldata")
#Create fake IDs for testing
Sacramento$Fake_ID <- stri_rand_strings(nrow(Sacramento), 10)
# Train model
Sacramento_recipe <- recipe(formula = price ~ type + sqft + beds + baths + zip + Fake_ID, data = Sacramento) %>%
update_role(Fake_ID, new_role = "ID") %>%
step_zv(all_predictors())
rf_spec <- rand_forest(mode = "regression") %>% set_engine("ranger")
rf_fit <-
workflow() %>%
add_model(rf_spec) %>%
add_recipe(Sacramento_recipe) %>%
fit(Sacramento)
# Create vetiver object
v <- vetiver::vetiver_model(rf_fit, "sacramento_rf")
v
# Allow for model versioning and sharing
model_board <- board_temp()
model_board %>% vetiver_pin_write(v)
# Deploying model
pr() %>%
vetiver_api(v) %>%
pr_run(port = 8088)
Running the example of the Plumber API
从今天开始,香根草寻找“霉菌”workflows::extract_mold(rf_fit)
并且只得到预测因子来创建 ptype。但是,当您从工作流中进行预测时,它确实需要所有变量,包括 non-predictors。如果您使用 non-predictors 训练了一个模型,从今天开始,您可以通过传入自定义 ptype
:
library(recipes)
#> Loading required package: dplyr
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
#>
#> Attaching package: 'recipes'
#> The following object is masked from 'package:stats':
#>
#> step
library(parsnip)
library(workflows)
library(pins)
library(plumber)
library(stringi)
data(Sacramento, package = "modeldata")
Sacramento$Fake_ID <- stri_rand_strings(nrow(Sacramento), 10)
Sacramento_recipe <-
recipe(formula = price ~ type + sqft + beds + baths + zip + Fake_ID,
data = Sacramento) %>%
update_role(Fake_ID, new_role = "ID") %>%
step_zv(all_predictors())
rf_spec <- rand_forest(mode = "regression") %>% set_engine("ranger")
rf_fit <-
workflow() %>%
add_model(rf_spec) %>%
add_recipe(Sacramento_recipe) %>%
fit(Sacramento)
library(vetiver)
## this is probably easiest because this model uses a simple formula
## if there is more complex preprocessing, select the variables
## from `Sacramento` via dplyr or similar
sac_ptype <- extract_recipe(rf_fit) %>%
bake(new_data = Sacramento, -all_outcomes()) %>%
vctrs::vec_ptype()
v <- vetiver_model(rf_fit, "sacramento_rf", save_ptype = sac_ptype)
v
#>
#> ── sacramento_rf ─ <butchered_workflow> model for deployment
#> A ranger regression modeling workflow using 6 features
pr() %>%
vetiver_api(v)
#> # Plumber router with 2 endpoints, 4 filters, and 0 sub-routers.
#> # Use `pr_run()` on this object to start the API.
#> ├──[queryString]
#> ├──[body]
#> ├──[cookieParser]
#> ├──[sharedSecret]
#> ├──/ping (GET)
#> └──/predict (POST)
由 reprex package (v2.0.1)
创建于 2022-03-10您是否正在使用 non-predictor 变量训练生产模型?您介意 opening an issue on GitHub 多解释一下您的用例吗?