Dplyr:case_when - 如何使用 select 列?

Dplyr: case_when - how to use to select a column?

我想使用 dplyr 中的 case_when 来 select 列更改其在 tidymodels 配方中的角色。

我做错了什么? 在以下 MWE 中,应将 ID 角色分配给“b”列:

library(tidyverse)
library(tidymodels)

# dummy data
a = seq(1:3)
b = seq(4:6)
c = seq(7:9)
df <- data.frame(a,b,c)

# filter variable
col_name = "foo"

rec <- recipe(a ~., data = df) %>%
  update_role(
              case_when(
                col_name == "foo" ~ b, # Not working too: .$b, df$b
                col_name == "foo2" ~ c), 
              new_role = "ID")
rec

不幸的是,case_when 不适用于您要实现的那种动态变量选择。相反,我建议使用包裹在函数中的 if (...) 来执行动态选择:

library(tidyverse)
library(tidymodels)

# dummy data
a = seq(1:3)
b = seq(4:6)
c = seq(7:9)
df <- data.frame(a,b,c)

# filter variable
col_name = "foo"

update_select <- function(recipe, col_name) {
  if (col_name == "foo") {
    update_role(recipe, b, new_role = "ID") 
  } else if (col_name == "foo2") {
    update_role(recipe, c, new_role = "ID")  
  }
}

rec <- recipe(a ~., data = df) %>%
  update_select(col_name)
rec
#> Data Recipe
#> 
#> Inputs:
#> 
#>       role #variables
#>         ID          1
#>    outcome          1
#>  predictor          1

有几种不同的方法可以做到这一点。我认为对于您在此处显示的示例,我将使用具有列名称的命名向量:

library(recipes)

# dummy data
a = seq(1:3)
b = seq(4:6)
c = seq(7:9)
df <- data.frame(a,b,c)

selector_vec <- c("foo" = "b", "foo2" = "c")

## could select more than one term here
my_terms <- selector_vec[["foo"]]
rec1 <- recipe(a ~ ., data = df) %>%
  update_role(all_of(my_terms), new_role = "ID")
prep(rec1)$term_info
#> # A tibble: 3 x 4
#>   variable type    role      source  
#>   <chr>    <chr>   <chr>     <chr>   
#> 1 b        numeric ID        original
#> 2 c        numeric predictor original
#> 3 a        numeric outcome   original

my_terms <- selector_vec[["foo2"]]
rec2 <- recipe(a ~ ., data = df) %>%
  update_role(all_of(my_terms), new_role = "ID")
prep(rec2)$term_info
#> # A tibble: 3 x 4
#>   variable type    role      source  
#>   <chr>    <chr>   <chr>     <chr>   
#> 1 b        numeric predictor original
#> 2 c        numeric ID        original
#> 3 a        numeric outcome   original

reprex package (v2.0.0)

于 2021-05-24 创建

在更现实的情况下,我会 use across() as shown here