如何使用 dplyr 和 tidy 评估以编程方式过滤数据框?

How to filter a data frame programmatically with dplyr and tidy evaluation?

假设我想以编程方式过滤 starwars 数据框。这是一个简单的例子,可以让我根据家园和物种进行过滤:

library(tidyverse)

# a function that allows the user to supply filters
filter_starwars <- function(filters) {
  for (filter in filters) {
    starwars = filter_at(starwars, filter$var, all_vars(. %in% filter$values))
  }

  return(starwars)
}

# filter Star Wars characters that are human, and from either Tatooine or Alderaan
filter_starwars(filters = list(
  list(var = "homeworld", values = c("Tatooine", "Alderaan")),
  list(var = "species", values = "Human")
))

但这不允许我指定高度过滤器,因为我在 filter_at().vars_predicate 中 hard-coded %in% 运算符,高度过滤器将使用 >>=<<=== 运算符之一

编写 filter_starwars() 函数的最佳方法是什么,以便用户可以提供足够通用的筛选器以沿任何列进行筛选并使用任何运算符?

NB 使用 now-deprecated filter_() 方法,我可以传递一个字符串:

filter_(starwars, "species == 'Human' & homeworld %in% c('Tatooine', 'Alderaan') & height > 175")

但同样,它已被弃用。

尝试

filter_starwars <- function(...) {
  F <- quos(...)
  filter(starwars, !!!F)
}

filter_starwars(species == 'Human', homeworld %in% c('Tatooine', 'Alderaan'), height > 175)
# # A tibble: 7 × 13
#                  name height  mass  hair_color skin_color eye_color birth_year
#                 <chr>  <int> <dbl>       <chr>      <chr>     <chr>      <dbl>
# 1         Darth Vader    202   136        none      white    yellow       41.9
# 2           Owen Lars    178   120 brown, grey      light      blue       52.0
# 3   Biggs Darklighter    183    84       black      light     brown       24.0
# 4    Anakin Skywalker    188    84       blond       fair      blue       41.9
# 5         Cliegg Lars    183    NA       brown       fair      blue       82.0
# 6 Bail Prestor Organa    191    NA       black        tan     brown       67.0
# 7     Raymus Antilles    188    79       brown      light     brown         NA
# # ... with 6 more variables: gender <chr>, homeworld <chr>, species <chr>,
# #   films <list>, vehicles <list>, starships <list>

参见https://cran.r-project.org/web/packages/dplyr/vignettes/programming.html。简而言之,quos 捕获 ... 作为列表,而不评估参数。 !!! 拼接和取消引用 filter() 中的评估参数。

这里有一些方法。

1) 对于这个特定问题,我们实际上不需要 filter_、rlang 或类似的。这有效:

filter_starwars <- function(...) {
    filter(starwars, ...)
}

# test
filter_starwars(species == 'Human', 
                homeworld %in% c('Tatooine', 'Alderaan'), 
                height > 175)
)

2) 如果字符参数很重要,那么:

library(rlang)

filter_starwars <- function(...) {
    filter(starwars, !!!parse_exprs(paste(..., sep = ";")))
}

# test
filter_starwars("species == 'Human'", 
                "homeworld %in% c('Tatooine', 'Alderaan')", 
                "height > 175")

2a) 或者如果要传递单个字符向量:

library(rlang)

filter_starwars <- function(filters) {
    filter(starwars, !!!parse_exprs(paste(filters, collapse = ";")))
}

# test 
filter_starwars(c("species == 'Human'", 
                  "homeworld %in% c('Tatooine', 'Alderaan')", 
                  "height > 175"))