从全局环境调用函数,在 dplyr::summarise 或 mutate 中使用隐式数据帧变量(来自调用环境?)

Call function from the global environment with implicit dataframe variables (from the calling env?) inside dplyr::summarise or mutate

我想在全局环境中创建一个函数列表,并在调用 mutate 或 summarise 时根据需要调用它们,这样可以使 dplyr 代码不那么冗长。问题是该函数必须使用数据帧内定义的变量,而不是全局环境。可能都和object scooping有关,这对我来说有点棘手。

对于以下所有代码,请加载所需的库:

library(dplyr)
library(purrr)
library(rlang)

一个例子: 对于 mtcars 数据集,我想 group_by 一个变量和 summarise 具有这三个函数: any_vs_four_gears any_am_high_hp all_combined.

我可以在调用中定义它们以总结如下,效果很好:

mtcars %>%
        group_by(carb) %>%
        summarise(any_vs_four_gears = any(vs == 1 & gear == 4),
                  any_am_high_hp = any(am == 1 & hp >170),
                  all_combined = all(any_vs_four_gears, any_am_high_hp))

# # A tibble: 6 × 4
carb any_vs_four_gears any_am_high_hp all_combined
<dbl> <lgl>             <lgl>          <lgl>
1     1 TRUE              FALSE          FALSE
2     2 TRUE              FALSE          FALSE
3     3 FALSE             FALSE          FALSE
4     4 TRUE              TRUE           TRUE
5     6 FALSE             TRUE           FALSE
6     8 FALSE             TRUE           FALSE

我还可以将函数定义为表达式,然后在对 summarise 的调用中计算表达式,如下所示:

expressions_as_strings <- list(any_vs_four_gears = 'any(vs == 1 & gear == 4)',
                               any_am_high_hp = 'any(am == 1 & hp >170)',
                               all_combined = 'all(any_vs_four_gears, any_am_high_hp)')
expressions <- map(expressions_as_strings, parse_expr)

mtcars %>%
        group_by(carb) %>%
        summarise(any_vs_four_gears = !!expressions$any_vs_four_gears,
                  any_am_high_hp = !!expressions$any_am_high_hp,
                  all_combined = !!expressions$all_combined)

但是,我觉得如果我可以定义函数而不是表达式,我可以获得更大的灵活性。

我尝试了几种方法都没有成功:

method_1

method_1 <- list(any_vs_four_gears = function() any(vs == 1 & gear == 4),
                  any_am_high_hp = function() any(am == 1 & hp >170),
                  all_combined = function() all(any_vs_four_gears, any_am_high_hp))
#example

mtcars %>%
        group_by(carb) %>%
        summarise(any_vs_four_gears = method_1$any_vs_four_gears())

method_1 失败。我认为这是因为该函数正在从全局环境而不是数据中获取 vs 和 gear 的值。

方法二

method_2 <- list(any_vs_four_gears = function(var1, var2) {any({{var1}} == 1 & {{var2}} == 4)},
                any_am_high_hp = function(var1, var2) {any({{var1}} == 1 & {{var2}} > 170)},
                all_combined = function(var1, var2) {all({{var1}}, {{var2}})})

# example

mtcars %>%
        group_by(carb) %>%
        summarise(any_vs_four_gears = method_2$any_vs_four_gears(vs, gear))

方法 2 确实有效,但我必须将变量作为参数包含在函数中,我希望能够绕过它。

主要问题

有没有一种方法可以创建一个函数,该函数使用数据框中的变量,但不需要将变量名作为参数包含在内? 我想要的是类似于 method_1 的东西,带有 伪代码 :

mtcars %>%
        group_by(carb) %>%
        summarise(any_vs_four_gears = method_x$any_vs_four_gears(),
                  any_am_high_hp = method_x$any_am_high_hp(),
                  all_combined = method_x$all_combined())

在前面,我通常反对编写破坏功能再现性的函数,花了太多时间对改变行为的函数进行故障排除基于未传递给他们的东西。

但是,试试这个:

method_1 <- list(
  any_vs_four_gears = function(data = cur_data()) with(data, any(vs == 1 & gear == 4)),
  any_am_high_hp = function(data = cur_data()) with(data, any(am == 1 & hp > 170)),
  all_combined = function(data = cur_data()) with(data, all(any_vs_four_gears, any_am_high_hp))
)

mtcars %>%
  group_by(carb) %>%
  summarise(
    any_vs_four_gears = method_1$any_vs_four_gears()
    any_am_high_hp = method_1$any_am_high_hp(),
    all_combined = method_1$all_combined()
  )
# # A tibble: 6 x 4
#    carb any_vs_four_gears any_am_high_hp all_combined
#   <dbl> <lgl>             <lgl>          <lgl>       
# 1     1 TRUE              FALSE          FALSE       
# 2     2 TRUE              FALSE          FALSE       
# 3     3 FALSE             FALSE          FALSE       
# 4     4 TRUE              TRUE           TRUE        
# 5     6 FALSE             TRUE           FALSE       
# 6     8 FALSE             TRUE           FALSE       

这使用 cur_data() pronoun/function 在 dplyr-pipe 环境中找到,只添加了一点周围代码(with(data, { ... }),所以 {-expression -友好),并“按原样”工作。

错误不难解释:

mtcars %>%
  select(-vs) %>%     # intentionally setting up an error
  group_by(carb) %>%
  summarise(
    any_vs_four_gears = method_1$any_vs_four_gears()
    any_am_high_hp = method_1$any_am_high_hp(),
    all_combined = method_1$all_combined()
  )
# Error: Problem with `summarise()` column `any_vs_four_gears`.
# i `any_vs_four_gears = method_1$any_vs_four_gears()`.
# x object 'vs' not found
# i The error occurred in group 1: carb = 1.
# Run `rlang::last_error()` to see where the error occurred.