如何将 filter(across(starts_with("foo"), ~ . logical-condition)) 与 mutate(bar = map2(...)) 结合起来?

How to combine filter(across(starts_with("foo"), ~ . logical-condition)) with mutate(bar = map2(...))?

我想将 dplyrfilter()starts_with() 等选择助手结合使用。

当前的 post 是 的后续,但在涉及 list-columnsmap2() 来自 {purrr} 包。

考虑以下 my_mtcars 数据框:

library(tibble)

my_mtcars <-
  mtcars %>%
  rownames_to_column("cars")

我想过滤任何 开始 with/contains 字符串 "cars" 的列,以仅保留以下汽车:

cars_to_keep <- c("Merc 240D", "Fiat X1-9", "Ferrari Dino")

因此,从 中我们学习了如何将选择助手与 filter() 一起使用,这样:

library(dplyr)

filter(my_mtcars, across(contains("cars"), ~ . %in% cars_to_keep))

##           cars  mpg cyl  disp  hp drat    wt qsec vs am gear carb
## 1    Merc 240D 24.4   4 146.7  62 3.69 3.190 20.0  1  0    4    2
## 2    Fiat X1-9 27.3   4  79.0  66 4.08 1.935 18.9  1  1    4    1
## 3 Ferrari Dino 19.7   6 145.0 175 3.62 2.770 15.5  0  1    5    6

到目前为止一切顺利。

以下数据结构出现问题:

higher_level_tibble <- 
  tibble(my_data         = list(my_mtcars),
         the_cars_i_want = list(cars_to_keep))

## # A tibble: 1 x 2
##   my_data        the_cars_i_want
##   <list>         <list>         
## 1 <df [32 x 12]> <chr [3]>  

尽管以下有效:

library(purrr)

higher_level_tibble %>%
  mutate(my_filtered_data = map2(.x = my_data, .y = the_cars_i_want, .f = ~filter(.x, cars %in% .y)))

## # A tibble: 1 x 3
##   my_data        the_cars_i_want my_filtered_data
##   <list>         <list>          <list>          
## 1 <df [32 x 12]> <chr [3]>       <df [3 x 12]>   

这不是:

higher_level_tibble %>%
  mutate(my_filtered_data = map2(.x = my_data, .y = the_cars_i_want, .f = ~ filter(.x, across(starts_with("cars"), ~ . %in% .y))))

Error: Problem with mutate() column my_filtered_data.
i my_filtered_data = map2(...).
x Problem with filter() input ..1.
i Input ..1 is across(starts_with("cars"), ~. %in% .y).
x the ... list contains fewer than 2 elements

我如何在 filter() 中利用 tidyselect 个助手,所有这些都在 purrr::map2() 中?


编辑


期望输出

higher_level_tibble %>%
  mutate(my_filtered_data = map2(.x = my_data, 
                                 .y = the_cars_i_want, 
                                 .f = ~ .x %>% filter( from the col in .x whose header starts with "cars", return only values that appear in .y )))

## # A tibble: 1 x 3
##   my_data        the_cars_i_want my_filtered_data
##   <list>         <list>          <list>          
## 1 <df [32 x 12]> <chr [3]>       <df [3 x 12]>  

一个可能的解决方案,使用purrr::pmap_dfr

library(tidyverse)

my_mtcars <-
  mtcars %>%
  rownames_to_column("cars")

cars_to_keep <- c("Merc 240D", "Fiat X1-9", "Ferrari Dino")

higher_level_tibble <- 
  tibble(my_data         = list(my_mtcars),
         the_cars_i_want = list(cars_to_keep))

higher_level_tibble %>% 
  pmap_dfr(~ ..1 %>% filter(across(contains("cars"), \(x) x %in% ..2))) %>% 
  nest(my_filtered_data = everything()) %>% 
  bind_cols(higher_level_tibble, .)

#> # A tibble: 1 × 3
#>   my_data        the_cars_i_want my_filtered_data 
#>   <list>         <list>          <list>           
#> 1 <df [32 × 12]> <chr [3]>       <tibble [3 × 12]>

@Paul Smith 的回答表明我自己的代码不起作用,即

higher_level_tibble %>%
  mutate(my_filtered_data = map2(.x = my_data, .y = the_cars_i_want, .f = ~ filter(.x, across(starts_with("cars"), ~ . %in% .y))))

可以使用匿名函数修复,例如:

higher_level_tibble %>%
  mutate(my_filtered_data = map2(.x = my_data, .y = the_cars_i_want, .f = ~ filter(.x, across(starts_with("cars"), function(x) x %in% .y))))

## # A tibble: 1 x 3
##   my_data        the_cars_i_want my_filtered_data
##   <list>         <list>          <list>          
## 1 <df [32 x 12]> <chr [3]>       <df [3 x 12]>