分组时指定变量名

Question

我正在使用 dplyr v1.0.2 来操作 tibbles。我想使用 group_by()，使用函数或正则表达式来指定相关的变量名称（... 参数）。我找到的唯一解决方案是笨拙的。有没有比较简单的方法？

这是一个演示问题的最小示例：

library(dplyr)
data(iris)
iris[, -(rbinom(1, 1, .5) + 1) ] %>%  # randomly drop "Sepal.Length" or "Sepal.Width"
  group_by(matches("^Sepal\."))

在第三行中，我随机删除了两个“萼片”列中的一个。在最后一行，我想按剩余的“萼片”列进行分组。问题是我不知道它的名字：它可能是“Sepal.Length”或“Sepal.Width”。最后一行中的 group_by() 命令不起作用：它可以预见 returns 一条 matches() must be used within a *selecting* function 错误消息。

相比之下，这段代码有效，但有点笨拙：

iris[, -(rbinom(1, 1, .5) + 1) ]  %>%
  group_by(!!as.name(grep('Sepal', colnames(.), val = TRUE)))

有没有更简单的方法在第二行进行分组？

Answer 1

如何使用 across 到 select 列

iris[, -(rbinom(1, 1, .5) + 1) ]  %>%
  group_by(across(starts_with('Sepal')))

# A tibble: 150 x 4
# Groups:   Sepal.Length [35]
   Sepal.Length Petal.Length Petal.Width Species
          <dbl>        <dbl>       <dbl> <fct>  
 1          5.1          1.4         0.2 setosa 
 2          4.9          1.4         0.2 setosa 
 3          4.7          1.3         0.2 setosa 
 4          4.6          1.5         0.2 setosa 
 5          5            1.4         0.2 setosa 
 6          5.4          1.7         0.4 setosa 
 7          4.6          1.4         0.3 setosa 
 8          5            1.5         0.2 setosa 
 9          4.4          1.4         0.2 setosa 
10          4.9          1.5         0.1 setosa 
# … with 140 more rows

分组时指定变量名

specify variable names when grouping

r

dplyr

nse

tidyeval