R 3.5.2：自定义函数内的管道 - 未找到对象 'column'

Question

我在自定义函数中遇到管道问题。根据之前的帖子，我了解到函数内的管道会创建另一个级别（？），这会导致我收到错误（见下文）。

我希望为包含数百个数字和分类变量的大型数据集编写一个汇总函数。我希望可以选择在不同的数据框（具有相似的结构）上使用它，始终按某个因子变量分组并获取多列的摘要。

library(tidyverse)
data(iris)

iris %>% group_by(Species) %>% summarise(count = n(), mean = mean(Sepal.Length, na.rm = T))

# A tibble: 3 x 3
  Species    count  mean
  <fct>      <int> <dbl>
1 setosa        50  5.01
2 versicolor    50  5.94
3 virginica     50  6.59

我希望创建这样的函数：

sum_cols <- function (df, col) { 
df %>% 
group_by(Species) %>% 
summarise(count = n(), 
mean = mean(col, na.rm = T)) 
}

这是我遇到的错误：

sum_cols(iris, Sepal.Length)

Error in mean(col, na.rm = T) : object 'Petal.Width' not found
Called from: mean(col, na.rm = T)

我遇到这个问题已经有一段时间了，尽管我试图在之前的几篇文章中找到答案，但我还没有完全理解为什么会出现这个问题以及如何解决它。

非常感谢任何帮助，谢谢！

Answer 1

尝试搜索非标准评估 (NSE)。

你可以在这里使用{{}}让R知道col是df中的列名。

library(dplyr)
library(rlang)

sum_cols <- function (df, col) { 
  df %>% 
    group_by(Species) %>% 
    summarise(count = n(), mean = mean({{col}}, na.rm = T)) 
  }

sum_cols(iris, Sepal.Length)

# A tibble: 3 x 3
#  Species    count  mean
#  <fct>      <int> <dbl>
#1 setosa        50  5.01
#2 versicolor    50  5.94
#3 virginica     50  6.59

如果我们没有最新的 rlang 我们可以使用 enquo 和 !!

的旧方法

sum_cols <- function (df, col) { 
   df %>% 
     group_by(Species) %>% 
     summarise(count = n(), mean = mean(!!enquo(col), na.rm = T)) 
}

sum_cols(iris, Sepal.Length)

R 3.5.2：自定义函数内的管道 - 未找到对象 'column'

R 3.5.2: Pipe inside custom function - object 'column' not found

r

pipe

chaining

dplyr