如何将 dplyr 中的 "summarise" 与动态列名一起使用？

Question

我正在使用 R 中 dplyr 包中的 summarize 函数从 table 总结组均值。我想动态地执行此操作，使用存储在另一个变量中的列名字符串。

以下是"normal"方式，当然有效：

myTibble <- group_by( iris, Species)
summarise( myTibble, avg = mean( Sepal.Length))

# A tibble: 3 x 2
  Species     avg
  <fct>      <dbl>
1 setosa      5.01
2 versicolor  5.94
3 virginica   6.59

但是，我想做这样的事情：

myTibble <- group_by( iris, Species)
colOfInterest <- "Sepal.Length"
summarise( myTibble, avg = mean( colOfInterest))

我已阅读 Programming with dplyr 页面，并尝试了 quo、enquo、!!、.dots=(...) 的一系列组合等等，但我还没有想出正确的方法。

我也知道，但是，1) 当我使用标准评估函数 standardise_ 时，R 告诉我它已折旧，并且 2) 答案没有看起来很优雅。那么，有没有一个好的、简单的方法来做到这一点？

谢谢！

Answer 1

1) 像这样使用 !!sym(...):

colOfInterest <- "Sepal.Length"
iris %>% 
  group_by(Species) %>%
  summarize(avg = mean(!!sym(colOfInterest))) %>%
  ungroup

给予：

# A tibble: 3 x 2
  Species      avg
  <fct>      <dbl>
1 setosa      5.01
2 versicolor  5.94
3 virginica   6.59

2) 第二种方法是：

colOfInterest <- "Sepal.Length"
iris %>% 
  group_by(Species) %>%
  summarize(avg = mean(.data[[colOfInterest]])) %>%
  ungroup

当然这在基础 R 中是直截了当的：

aggregate(list(avg = iris[[colOfInterest]]), iris["Species"], mean)

Answer 2

另一个解决方案：

iris %>% 
  group_by(Species) %>% 
  summarise_at(vars("Sepal.Length"), mean) %>%
  ungroup()

# A tibble: 3 x 2
  Species    Sepal.Length
  <fct>             <dbl>
1 setosa             5.01
2 versicolor         5.94
3 virginica          6.59

如何将 dplyr 中的 "summarise" 与动态列名一起使用？

How to use "summarise" from dplyr with dynamic column names?

r

dplyr

summarize