如何使用 dplyr 和 tidy 评估以编程方式过滤数据框?
How to filter a data frame programmatically with dplyr and tidy evaluation?
假设我想以编程方式过滤 starwars
数据框。这是一个简单的例子,可以让我根据家园和物种进行过滤:
library(tidyverse)
# a function that allows the user to supply filters
filter_starwars <- function(filters) {
for (filter in filters) {
starwars = filter_at(starwars, filter$var, all_vars(. %in% filter$values))
}
return(starwars)
}
# filter Star Wars characters that are human, and from either Tatooine or Alderaan
filter_starwars(filters = list(
list(var = "homeworld", values = c("Tatooine", "Alderaan")),
list(var = "species", values = "Human")
))
但这不允许我指定高度过滤器,因为我在 filter_at()
的 .vars_predicate
中 hard-coded %in%
运算符,高度过滤器将使用 >
、>=
、<
、<=
或 ==
运算符之一
编写 filter_starwars()
函数的最佳方法是什么,以便用户可以提供足够通用的筛选器以沿任何列进行筛选并使用任何运算符?
NB 使用 now-deprecated filter_()
方法,我可以传递一个字符串:
filter_(starwars, "species == 'Human' & homeworld %in% c('Tatooine', 'Alderaan') & height > 175")
但同样,它已被弃用。
尝试
filter_starwars <- function(...) {
F <- quos(...)
filter(starwars, !!!F)
}
filter_starwars(species == 'Human', homeworld %in% c('Tatooine', 'Alderaan'), height > 175)
# # A tibble: 7 × 13
# name height mass hair_color skin_color eye_color birth_year
# <chr> <int> <dbl> <chr> <chr> <chr> <dbl>
# 1 Darth Vader 202 136 none white yellow 41.9
# 2 Owen Lars 178 120 brown, grey light blue 52.0
# 3 Biggs Darklighter 183 84 black light brown 24.0
# 4 Anakin Skywalker 188 84 blond fair blue 41.9
# 5 Cliegg Lars 183 NA brown fair blue 82.0
# 6 Bail Prestor Organa 191 NA black tan brown 67.0
# 7 Raymus Antilles 188 79 brown light brown NA
# # ... with 6 more variables: gender <chr>, homeworld <chr>, species <chr>,
# # films <list>, vehicles <list>, starships <list>
参见https://cran.r-project.org/web/packages/dplyr/vignettes/programming.html。简而言之,quos
捕获 ...
作为列表,而不评估参数。 !!!
拼接和取消引用 filter()
中的评估参数。
这里有一些方法。
1) 对于这个特定问题,我们实际上不需要 filter_
、rlang 或类似的。这有效:
filter_starwars <- function(...) {
filter(starwars, ...)
}
# test
filter_starwars(species == 'Human',
homeworld %in% c('Tatooine', 'Alderaan'),
height > 175)
)
2) 如果字符参数很重要,那么:
library(rlang)
filter_starwars <- function(...) {
filter(starwars, !!!parse_exprs(paste(..., sep = ";")))
}
# test
filter_starwars("species == 'Human'",
"homeworld %in% c('Tatooine', 'Alderaan')",
"height > 175")
2a) 或者如果要传递单个字符向量:
library(rlang)
filter_starwars <- function(filters) {
filter(starwars, !!!parse_exprs(paste(filters, collapse = ";")))
}
# test
filter_starwars(c("species == 'Human'",
"homeworld %in% c('Tatooine', 'Alderaan')",
"height > 175"))
假设我想以编程方式过滤 starwars
数据框。这是一个简单的例子,可以让我根据家园和物种进行过滤:
library(tidyverse)
# a function that allows the user to supply filters
filter_starwars <- function(filters) {
for (filter in filters) {
starwars = filter_at(starwars, filter$var, all_vars(. %in% filter$values))
}
return(starwars)
}
# filter Star Wars characters that are human, and from either Tatooine or Alderaan
filter_starwars(filters = list(
list(var = "homeworld", values = c("Tatooine", "Alderaan")),
list(var = "species", values = "Human")
))
但这不允许我指定高度过滤器,因为我在 filter_at()
的 .vars_predicate
中 hard-coded %in%
运算符,高度过滤器将使用 >
、>=
、<
、<=
或 ==
运算符之一
编写 filter_starwars()
函数的最佳方法是什么,以便用户可以提供足够通用的筛选器以沿任何列进行筛选并使用任何运算符?
NB 使用 now-deprecated filter_()
方法,我可以传递一个字符串:
filter_(starwars, "species == 'Human' & homeworld %in% c('Tatooine', 'Alderaan') & height > 175")
但同样,它已被弃用。
尝试
filter_starwars <- function(...) {
F <- quos(...)
filter(starwars, !!!F)
}
filter_starwars(species == 'Human', homeworld %in% c('Tatooine', 'Alderaan'), height > 175)
# # A tibble: 7 × 13
# name height mass hair_color skin_color eye_color birth_year
# <chr> <int> <dbl> <chr> <chr> <chr> <dbl>
# 1 Darth Vader 202 136 none white yellow 41.9
# 2 Owen Lars 178 120 brown, grey light blue 52.0
# 3 Biggs Darklighter 183 84 black light brown 24.0
# 4 Anakin Skywalker 188 84 blond fair blue 41.9
# 5 Cliegg Lars 183 NA brown fair blue 82.0
# 6 Bail Prestor Organa 191 NA black tan brown 67.0
# 7 Raymus Antilles 188 79 brown light brown NA
# # ... with 6 more variables: gender <chr>, homeworld <chr>, species <chr>,
# # films <list>, vehicles <list>, starships <list>
参见https://cran.r-project.org/web/packages/dplyr/vignettes/programming.html。简而言之,quos
捕获 ...
作为列表,而不评估参数。 !!!
拼接和取消引用 filter()
中的评估参数。
这里有一些方法。
1) 对于这个特定问题,我们实际上不需要 filter_
、rlang 或类似的。这有效:
filter_starwars <- function(...) {
filter(starwars, ...)
}
# test
filter_starwars(species == 'Human',
homeworld %in% c('Tatooine', 'Alderaan'),
height > 175)
)
2) 如果字符参数很重要,那么:
library(rlang)
filter_starwars <- function(...) {
filter(starwars, !!!parse_exprs(paste(..., sep = ";")))
}
# test
filter_starwars("species == 'Human'",
"homeworld %in% c('Tatooine', 'Alderaan')",
"height > 175")
2a) 或者如果要传递单个字符向量:
library(rlang)
filter_starwars <- function(filters) {
filter(starwars, !!!parse_exprs(paste(filters, collapse = ";")))
}
# test
filter_starwars(c("species == 'Human'",
"homeworld %in% c('Tatooine', 'Alderaan')",
"height > 175"))