Dplyr:case_when - 如何使用 select 列?
Dplyr: case_when - how to use to select a column?
我想使用 dplyr
中的 case_when
来 select 列更改其在 tidymodels
配方中的角色。
我做错了什么?
在以下 MWE 中,应将 ID 角色分配给“b”列:
library(tidyverse)
library(tidymodels)
# dummy data
a = seq(1:3)
b = seq(4:6)
c = seq(7:9)
df <- data.frame(a,b,c)
# filter variable
col_name = "foo"
rec <- recipe(a ~., data = df) %>%
update_role(
case_when(
col_name == "foo" ~ b, # Not working too: .$b, df$b
col_name == "foo2" ~ c),
new_role = "ID")
rec
不幸的是,case_when
不适用于您要实现的那种动态变量选择。相反,我建议使用包裹在函数中的 if (...)
来执行动态选择:
library(tidyverse)
library(tidymodels)
# dummy data
a = seq(1:3)
b = seq(4:6)
c = seq(7:9)
df <- data.frame(a,b,c)
# filter variable
col_name = "foo"
update_select <- function(recipe, col_name) {
if (col_name == "foo") {
update_role(recipe, b, new_role = "ID")
} else if (col_name == "foo2") {
update_role(recipe, c, new_role = "ID")
}
}
rec <- recipe(a ~., data = df) %>%
update_select(col_name)
rec
#> Data Recipe
#>
#> Inputs:
#>
#> role #variables
#> ID 1
#> outcome 1
#> predictor 1
有几种不同的方法可以做到这一点。我认为对于您在此处显示的示例,我将使用具有列名称的命名向量:
library(recipes)
# dummy data
a = seq(1:3)
b = seq(4:6)
c = seq(7:9)
df <- data.frame(a,b,c)
selector_vec <- c("foo" = "b", "foo2" = "c")
## could select more than one term here
my_terms <- selector_vec[["foo"]]
rec1 <- recipe(a ~ ., data = df) %>%
update_role(all_of(my_terms), new_role = "ID")
prep(rec1)$term_info
#> # A tibble: 3 x 4
#> variable type role source
#> <chr> <chr> <chr> <chr>
#> 1 b numeric ID original
#> 2 c numeric predictor original
#> 3 a numeric outcome original
my_terms <- selector_vec[["foo2"]]
rec2 <- recipe(a ~ ., data = df) %>%
update_role(all_of(my_terms), new_role = "ID")
prep(rec2)$term_info
#> # A tibble: 3 x 4
#> variable type role source
#> <chr> <chr> <chr> <chr>
#> 1 b numeric predictor original
#> 2 c numeric ID original
#> 3 a numeric outcome original
由 reprex package (v2.0.0)
于 2021-05-24 创建
在更现实的情况下,我会 use across()
as shown here。
我想使用 dplyr
中的 case_when
来 select 列更改其在 tidymodels
配方中的角色。
我做错了什么? 在以下 MWE 中,应将 ID 角色分配给“b”列:
library(tidyverse)
library(tidymodels)
# dummy data
a = seq(1:3)
b = seq(4:6)
c = seq(7:9)
df <- data.frame(a,b,c)
# filter variable
col_name = "foo"
rec <- recipe(a ~., data = df) %>%
update_role(
case_when(
col_name == "foo" ~ b, # Not working too: .$b, df$b
col_name == "foo2" ~ c),
new_role = "ID")
rec
不幸的是,case_when
不适用于您要实现的那种动态变量选择。相反,我建议使用包裹在函数中的 if (...)
来执行动态选择:
library(tidyverse)
library(tidymodels)
# dummy data
a = seq(1:3)
b = seq(4:6)
c = seq(7:9)
df <- data.frame(a,b,c)
# filter variable
col_name = "foo"
update_select <- function(recipe, col_name) {
if (col_name == "foo") {
update_role(recipe, b, new_role = "ID")
} else if (col_name == "foo2") {
update_role(recipe, c, new_role = "ID")
}
}
rec <- recipe(a ~., data = df) %>%
update_select(col_name)
rec
#> Data Recipe
#>
#> Inputs:
#>
#> role #variables
#> ID 1
#> outcome 1
#> predictor 1
有几种不同的方法可以做到这一点。我认为对于您在此处显示的示例,我将使用具有列名称的命名向量:
library(recipes)
# dummy data
a = seq(1:3)
b = seq(4:6)
c = seq(7:9)
df <- data.frame(a,b,c)
selector_vec <- c("foo" = "b", "foo2" = "c")
## could select more than one term here
my_terms <- selector_vec[["foo"]]
rec1 <- recipe(a ~ ., data = df) %>%
update_role(all_of(my_terms), new_role = "ID")
prep(rec1)$term_info
#> # A tibble: 3 x 4
#> variable type role source
#> <chr> <chr> <chr> <chr>
#> 1 b numeric ID original
#> 2 c numeric predictor original
#> 3 a numeric outcome original
my_terms <- selector_vec[["foo2"]]
rec2 <- recipe(a ~ ., data = df) %>%
update_role(all_of(my_terms), new_role = "ID")
prep(rec2)$term_info
#> # A tibble: 3 x 4
#> variable type role source
#> <chr> <chr> <chr> <chr>
#> 1 b numeric predictor original
#> 2 c numeric ID original
#> 3 a numeric outcome original
由 reprex package (v2.0.0)
于 2021-05-24 创建在更现实的情况下,我会 use across()
as shown here。