在 group_by() %>% mutate() 函数调用中使用引用变量
Use quoted variable in group_by() %>% mutate() function call
可重现的例子
cats <-
data.frame(
name = c(letters[1:10]),
weight = c(rnorm(5, 10, 1), rnorm(5, 20, 3)),
type = c(rep("not_fat", 5), rep("fat", 5))
)
get_means <- function(df, metric, group) {
df %>%
group_by(.[[group]]) %>%
mutate(mean_stat = mean(.[[metric]])) %>%
pull(mean_stat) %>%
unique()
}
get_means(cats, metric = "weight", group = "type")
我试过的
我希望得到两个值,但我得到一个值。看来 groupby 失败了。
我尝试了所有方法,包括使用 quo()、eval() 和 substitute()、UQ()、!! 以及大量其他方法来尝试在 group_by( ) 工作。
这看起来很简单,但我想不通。
代码推理
决定将变量放在引号中是因为我在 ggplot aes_string() 调用中使用了它们。我在函数中排除了 ggplot 代码以简化代码,否则它会很容易,因为我们可以使用标准评估。
如果您想使用字符串作为名称,如您的示例所示,正确的方法是使用 sym
将字符串转换为符号并使用 !!
取消引号:
get_means <- function(df, metric, group) {
df %>%
group_by(!!sym(group)) %>%
mutate(mean_stat = mean(!!sym(metric))) %>%
pull(mean_stat) %>%
unique()
}
get_means(cats, metric = "weight", group = "type")
[1] 10.06063 17.45906
如果您想在函数中使用裸名,请使用 enquo
和 !!
:
get_means <- function(df, metric, group) {
group <- enquo(group)
metric <- enquo(metric)
df %>%
group_by(!!group) %>%
mutate(mean_stat = mean(!!metric)) %>%
pull(mean_stat) %>%
unique()
}
get_means(cats, metric = weight, group = type)
[1] 10.06063 17.45906
您的示例中发生了什么?
有趣的是 .[[group]]
,确实适用于分组,但不是您想的那样。这将数据框的指定列子集作为向量,然后使其成为一个新变量,它分组在:
cats %>%
group_by(.[['type']])
# A tibble: 10 x 4
# Groups: .[["type"]] [2]
name weight type `.[["type"]]`
<fct> <dbl> <fct> <fct>
1 a 9.60 not_fat not_fat
2 b 8.71 not_fat not_fat
3 c 12.0 not_fat not_fat
4 d 8.48 not_fat not_fat
5 e 11.5 not_fat not_fat
6 f 17.0 fat fat
7 g 20.3 fat fat
8 h 17.3 fat fat
9 i 15.3 fat fat
10 j 17.4 fat fat
您的问题来自 mutate
语句。 mutate(mean_stat = mean(.[['weight']]))
没有选择,只是将 weight
列提取为向量,计算平均值,然后将该单个值分配给新列
cats %>%
group_by(.[['type']]) %>%
mutate(mean_stat = mean(.[['weight']]))
# A tibble: 10 x 5
# Groups: .[["type"]] [2]
name weight type `.[["type"]]` mean_stat
<fct> <dbl> <fct> <fct> <dbl>
1 a 9.60 not_fat not_fat 13.8
2 b 8.71 not_fat not_fat 13.8
3 c 12.0 not_fat not_fat 13.8
4 d 8.48 not_fat not_fat 13.8
5 e 11.5 not_fat not_fat 13.8
6 f 17.0 fat fat 13.8
7 g 20.3 fat fat 13.8
8 h 17.3 fat fat 13.8
9 i 15.3 fat fat 13.8
10 j 17.4 fat fat 13.8
我会稍微修改一下(如果我理解正确你想要达到的目标):
get_means <- function(df, metric, group) {
df %>%
group_by(!!sym(group)) %>%
summarise(mean_stat = mean(!!sym(metric)))%>% pull(mean_stat)
}
get_means(cats, "weight", "type")
[1] 20.671772 9.305811
给出与 :
完全相同的输出
cats %>% group_by(type) %>% summarise(mean_stat=mean(weight)) %>%
pull(mean_stat)
[1] 20.671772 9.305811
我认为 "intended" 在 tidyeval 框架中执行此操作的方法是将参数作为名称(而不是字符串)输入,然后使用 enquo()
引用参数。 ggplot2
理解整洁的评估运算符,因此这也适用于 ggplot2
。
首先,让我们在您的示例中调整 dplyr
汇总函数:
library(tidyverse)
library(rlang)
get_means <- function(df, metric, group) {
metric = enquo(metric)
group = enquo(group)
df %>%
group_by(!!group) %>%
summarise(!!paste0("mean_", as_label(metric)) := mean(!!metric))
}
get_means(cats, weight, type)
type mean_weight
1 fat 20.0
2 not_fat 10.2
get_means(iris, Petal.Width, Species)
Species mean_Petal.Width
1 setosa 0.246
2 versicolor 1.33
3 virginica 2.03
现在在ggplot中添加:
get_means <- function(df, metric, group) {
metric = enquo(metric)
group = enquo(group)
df %>%
group_by(!!group) %>%
summarise(mean_stat = mean(!!metric)) %>%
ggplot(aes(!!group, mean_stat)) +
geom_point()
}
get_means(cats, weight, type)
我不确定您想要哪种类型的绘图,但您可以使用 tidy evaluation 绘制数据和汇总值。例如:
plot_func = function(data, metric, group) {
metric = enquo(metric)
group = enquo(group)
data %>%
ggplot(aes(!!group, !!metric)) +
geom_point() +
geom_point(data=. %>%
group_by(!!group) %>%
summarise(!!metric := mean(!!metric)),
shape="_", colour="red", size=8) +
expand_limits(y=0) +
scale_y_continuous(expand=expand_scale(mult=c(0,0.02)))
}
plot_func(cats, weight, type)
仅供参考,您可以使用 ...
参数和 enquos
而不是 enquo
(这还需要使用 !!!
(unquote-splice) 而不是 !!
(unquote)).
get_means <- function(df, metric, ...) {
metric = enquo(metric)
groups = enquos(...)
df %>%
group_by(!!!groups) %>%
summarise(!!paste0("mean_", quo_text(metric)) := mean(!!metric))
}
get_means(mtcars, mpg, cyl, vs)
cyl vs mean_mpg
1 4 0 26
2 4 1 26.7
3 6 0 20.6
4 6 1 19.1
5 8 0 15.1
get_means(mtcars, mpg)
mean_mpg
1 20.1
magrittr 代词 .
代表整个数据,因此您已取所有观察值的平均值。相反,使用整洁的 eval 代词 .data
代表当前组的数据帧切片:
get_means <- function(df, metric, group) {
df %>%
group_by(.data[[group]]) %>%
mutate(mean_stat = mean(.data[[metric]])) %>%
pull(mean_stat) %>%
unique()
}
使用 *_at
函数:
library(dplyr)
get_means <- function(df, metric, group) {
df %>%
group_by_at(group) %>%
mutate_at(metric,list(mean_stat = mean)) %>%
pull(mean_stat) %>%
unique()
}
get_means(cats, metric = "weight", group = "type")
# [1] 10.12927 20.40541
数据
set.seed(1)
cats <-
data.frame(
name = c(letters[1:10]),
weight = c(rnorm(5, 10, 1), rnorm(5, 20, 3)),
type = c(rep("not_fat", 5), rep("fat", 5))
)
使用across()
、.data
和 {}
重命名更新了答案,并根据 OP 将原始函数参数保留为字符串:
library(tidyverse)
get_means <- function(dat = mtcars, metric = "wt", group = "cyl") {
dat %>%
group_by(across(all_of(c(group)))) %>%
summarise("{paste0('mean_',metric)}" := mean(.data[[metric]]), .groups="keep")
}
get_means()
请参阅:?dplyr_data_masking
了解更详细的讨论。
可重现的例子
cats <-
data.frame(
name = c(letters[1:10]),
weight = c(rnorm(5, 10, 1), rnorm(5, 20, 3)),
type = c(rep("not_fat", 5), rep("fat", 5))
)
get_means <- function(df, metric, group) {
df %>%
group_by(.[[group]]) %>%
mutate(mean_stat = mean(.[[metric]])) %>%
pull(mean_stat) %>%
unique()
}
get_means(cats, metric = "weight", group = "type")
我试过的
我希望得到两个值,但我得到一个值。看来 groupby 失败了。
我尝试了所有方法,包括使用 quo()、eval() 和 substitute()、UQ()、!! 以及大量其他方法来尝试在 group_by( ) 工作。
这看起来很简单,但我想不通。
代码推理
决定将变量放在引号中是因为我在 ggplot aes_string() 调用中使用了它们。我在函数中排除了 ggplot 代码以简化代码,否则它会很容易,因为我们可以使用标准评估。
如果您想使用字符串作为名称,如您的示例所示,正确的方法是使用 sym
将字符串转换为符号并使用 !!
取消引号:
get_means <- function(df, metric, group) {
df %>%
group_by(!!sym(group)) %>%
mutate(mean_stat = mean(!!sym(metric))) %>%
pull(mean_stat) %>%
unique()
}
get_means(cats, metric = "weight", group = "type")
[1] 10.06063 17.45906
如果您想在函数中使用裸名,请使用 enquo
和 !!
:
get_means <- function(df, metric, group) {
group <- enquo(group)
metric <- enquo(metric)
df %>%
group_by(!!group) %>%
mutate(mean_stat = mean(!!metric)) %>%
pull(mean_stat) %>%
unique()
}
get_means(cats, metric = weight, group = type)
[1] 10.06063 17.45906
您的示例中发生了什么?
有趣的是 .[[group]]
,确实适用于分组,但不是您想的那样。这将数据框的指定列子集作为向量,然后使其成为一个新变量,它分组在:
cats %>%
group_by(.[['type']])
# A tibble: 10 x 4
# Groups: .[["type"]] [2]
name weight type `.[["type"]]`
<fct> <dbl> <fct> <fct>
1 a 9.60 not_fat not_fat
2 b 8.71 not_fat not_fat
3 c 12.0 not_fat not_fat
4 d 8.48 not_fat not_fat
5 e 11.5 not_fat not_fat
6 f 17.0 fat fat
7 g 20.3 fat fat
8 h 17.3 fat fat
9 i 15.3 fat fat
10 j 17.4 fat fat
您的问题来自 mutate
语句。 mutate(mean_stat = mean(.[['weight']]))
没有选择,只是将 weight
列提取为向量,计算平均值,然后将该单个值分配给新列
cats %>%
group_by(.[['type']]) %>%
mutate(mean_stat = mean(.[['weight']]))
# A tibble: 10 x 5
# Groups: .[["type"]] [2]
name weight type `.[["type"]]` mean_stat
<fct> <dbl> <fct> <fct> <dbl>
1 a 9.60 not_fat not_fat 13.8
2 b 8.71 not_fat not_fat 13.8
3 c 12.0 not_fat not_fat 13.8
4 d 8.48 not_fat not_fat 13.8
5 e 11.5 not_fat not_fat 13.8
6 f 17.0 fat fat 13.8
7 g 20.3 fat fat 13.8
8 h 17.3 fat fat 13.8
9 i 15.3 fat fat 13.8
10 j 17.4 fat fat 13.8
我会稍微修改一下(如果我理解正确你想要达到的目标):
get_means <- function(df, metric, group) {
df %>%
group_by(!!sym(group)) %>%
summarise(mean_stat = mean(!!sym(metric)))%>% pull(mean_stat)
}
get_means(cats, "weight", "type")
[1] 20.671772 9.305811
给出与 :
完全相同的输出cats %>% group_by(type) %>% summarise(mean_stat=mean(weight)) %>%
pull(mean_stat)
[1] 20.671772 9.305811
我认为 "intended" 在 tidyeval 框架中执行此操作的方法是将参数作为名称(而不是字符串)输入,然后使用 enquo()
引用参数。 ggplot2
理解整洁的评估运算符,因此这也适用于 ggplot2
。
首先,让我们在您的示例中调整 dplyr
汇总函数:
library(tidyverse)
library(rlang)
get_means <- function(df, metric, group) {
metric = enquo(metric)
group = enquo(group)
df %>%
group_by(!!group) %>%
summarise(!!paste0("mean_", as_label(metric)) := mean(!!metric))
}
get_means(cats, weight, type)
type mean_weight 1 fat 20.0 2 not_fat 10.2
get_means(iris, Petal.Width, Species)
Species mean_Petal.Width 1 setosa 0.246 2 versicolor 1.33 3 virginica 2.03
现在在ggplot中添加:
get_means <- function(df, metric, group) {
metric = enquo(metric)
group = enquo(group)
df %>%
group_by(!!group) %>%
summarise(mean_stat = mean(!!metric)) %>%
ggplot(aes(!!group, mean_stat)) +
geom_point()
}
get_means(cats, weight, type)
我不确定您想要哪种类型的绘图,但您可以使用 tidy evaluation 绘制数据和汇总值。例如:
plot_func = function(data, metric, group) {
metric = enquo(metric)
group = enquo(group)
data %>%
ggplot(aes(!!group, !!metric)) +
geom_point() +
geom_point(data=. %>%
group_by(!!group) %>%
summarise(!!metric := mean(!!metric)),
shape="_", colour="red", size=8) +
expand_limits(y=0) +
scale_y_continuous(expand=expand_scale(mult=c(0,0.02)))
}
plot_func(cats, weight, type)
仅供参考,您可以使用 ...
参数和 enquos
而不是 enquo
(这还需要使用 !!!
(unquote-splice) 而不是 !!
(unquote)).
get_means <- function(df, metric, ...) { metric = enquo(metric) groups = enquos(...) df %>% group_by(!!!groups) %>% summarise(!!paste0("mean_", quo_text(metric)) := mean(!!metric)) }
get_means(mtcars, mpg, cyl, vs)
cyl vs mean_mpg 1 4 0 26 2 4 1 26.7 3 6 0 20.6 4 6 1 19.1 5 8 0 15.1
get_means(mtcars, mpg)
mean_mpg 1 20.1
magrittr 代词 .
代表整个数据,因此您已取所有观察值的平均值。相反,使用整洁的 eval 代词 .data
代表当前组的数据帧切片:
get_means <- function(df, metric, group) {
df %>%
group_by(.data[[group]]) %>%
mutate(mean_stat = mean(.data[[metric]])) %>%
pull(mean_stat) %>%
unique()
}
使用 *_at
函数:
library(dplyr)
get_means <- function(df, metric, group) {
df %>%
group_by_at(group) %>%
mutate_at(metric,list(mean_stat = mean)) %>%
pull(mean_stat) %>%
unique()
}
get_means(cats, metric = "weight", group = "type")
# [1] 10.12927 20.40541
数据
set.seed(1)
cats <-
data.frame(
name = c(letters[1:10]),
weight = c(rnorm(5, 10, 1), rnorm(5, 20, 3)),
type = c(rep("not_fat", 5), rep("fat", 5))
)
使用across()
、.data
和 {}
重命名更新了答案,并根据 OP 将原始函数参数保留为字符串:
library(tidyverse)
get_means <- function(dat = mtcars, metric = "wt", group = "cyl") {
dat %>%
group_by(across(all_of(c(group)))) %>%
summarise("{paste0('mean_',metric)}" := mean(.data[[metric]]), .groups="keep")
}
get_means()
请参阅:?dplyr_data_masking
了解更详细的讨论。