当列名称-值对存储在列表中时过滤数据框?

Filter dataframe when column name-value pairs are stored in a list?

我有一个像这样的数据框:

df <- tibble::rownames_to_column(USArrests, "State") %>% 
  tidyr::pivot_longer(cols = -State)

head(df)
# A tibble: 6 x 3
  State   name     value
  <chr>   <chr>    <dbl>
1 Alabama Murder    13.2
2 Alabama Assault  236  
3 Alabama UrbanPop  58  
4 Alabama Rape      21.2
5 Alaska  Murder    10  
6 Alaska  Assault  263  

在一个单独的列表对象中 l 我有一些列,我需要从数据框中删除这些列。元素名称是列名称,值对应于我要删除的行:

l <- list(State = c("Alabama", "Pennsylvania", "Texas"),
          name = c("Murder", "Assault"))

硬编码它会这样做:

dplyr::filter(df, !State %in% c("Alabama", "Pennsylvania", "Texas"), !name %in% c("Murder", "Assault"))

   State      name     value
   <chr>      <chr>    <dbl>
 1 Alaska     UrbanPop  48  
 2 Alaska     Rape      44.5
 3 Arizona    UrbanPop  80  
 4 Arizona    Rape      31  
 5 Arkansas   UrbanPop  50  
 6 Arkansas   Rape      19.5
 7 California UrbanPop  91  
 8 California Rape      40.6
 9 Colorado   UrbanPop  78  
10 Colorado   Rape      38.7
# ... with 84 more rows

但是,l 经常变化,所以我 cannot/don 不想硬编码。我尝试了以下操作,但只评估了最后一个表达式:

library(purrr)
filter_expr <- imap_chr(l, ~ paste0("! ", 
                     .y, 
                     " %in% c(\"", 
                     paste(.x, collapse = "\",\""), 
                     "\")")) %>% parse(text = .)

filter(df, eval(filter_expr))

   State      name     value
   <chr>      <chr>    <dbl>
 1 Alabama    UrbanPop  58  
 2 Alabama    Rape      21.2
 3 Alaska     UrbanPop  48  
 4 Alaska     Rape      44.5
 5 Arizona    UrbanPop  80  
 6 Arizona    Rape      31  
 7 Arkansas   UrbanPop  50  
 8 Arkansas   Rape      19.5
 9 California UrbanPop  91  
10 California Rape      40.6
# ... with 90 more rows

当过滤条件存储在像 l 这样更适合 tidyverse 的结构中时,是否有过滤 df 的方法?

我考虑过这个,但是,表达式不是动态的。

我们可以在 filter 中使用 across 循环遍历 'l' 的 names,通过使用来自的键对 'l' 进行子集化来创建逻辑表达式列名 (cur_column()) 和取反 (!)。请注意,cur_column() 目前仅适用于 across,不适用于 if_all/if_anydplyr -1.0.6 on R 4.1.0

library(dplyr)
df %>% 
   filter(across(all_of(names(l)), ~ !. %in% l[[cur_column()]]))

-输出

# A tibble: 94 x 3
#   State      name     value
#   <chr>      <chr>    <dbl>
# 1 Alaska     UrbanPop  48  
# 2 Alaska     Rape      44.5
# 3 Arizona    UrbanPop  80  
# 4 Arizona    Rape      31  
# 5 Arkansas   UrbanPop  50  
# 6 Arkansas   Rape      19.5
# 7 California UrbanPop  91  
# 8 California Rape      40.6
# 9 Colorado   UrbanPop  78  
#10 Colorado   Rape      38.7
# … with 84 more rows

如果我们可以设置一个属性,我们就可以使用if_all

library(magrittr)
df %>% 
  mutate(across(all_of(names(l)), ~ set_attr(., 'cn', cur_column()))) %>% 
  filter(if_all(all_of(names(l)), ~ ! . %in% l[[attr(., 'cn')]]))

imap/reduce

library(purrr)
df %>%
    filter(imap(l, ~ !cur_data()[[.y]] %in% .x) %>%
                 reduce(`&`))

或者另一种选择是 anti_join

for(nm in names(l)) df <- anti_join(df, tibble(!! nm := l[[nm]]))

这里的另一个潜在选项是使用 purrr 创建一个逻辑向量,允许 &| 条件并可以访问当前列名(.y ) 不带 cur_column,只能在 across:

内部使用
df %>% 
  filter(imap(l, ~ !df[[.y]] %in% .x) %>% reduce(`&`)) # can use magrittr::and

输出

   State      name     value
   <chr>      <chr>    <dbl>
 1 Alaska     UrbanPop  48  
 2 Alaska     Rape      44.5
 3 Arizona    UrbanPop  80  
 4 Arizona    Rape      31  
 5 Arkansas   UrbanPop  50  
 6 Arkansas   Rape      19.5
 7 California UrbanPop  91  
 8 California Rape      40.6
 9 Colorado   UrbanPop  78  
10 Colorado   Rape      38.7
# ... with 84 more rows

变体是:

df %>% 
  filter(imap(l, ~ !df[[.y]] %in% .x) %>% reduce(`|`)) # can use magrittr::or