将多列的值传递给 dplyr 汇总函数
passing values from multiple columns into dplyr summarise function
请考虑以下最小示例:我将两个实验 A 和 B 的观察结果合并为 dplyr tibble。 llim
和 ulim
定义每组中可观察值的下限和上限。
library(dplyr)
name <- factor (c (rep('A', 400),
rep('B', 260)
)
)
obs <- c (sample(-23:28, 400, replace = TRUE),
sample(-15:39, 260, replace = TRUE)
)
llim <- c (rep(-23, 400),
rep(-15, 260)
)
ulim <- c (rep(28, 400),
rep(39, 260)
)
tib1 <- tibble (name, obs, llim, ulim)
tib1
# A tibble: 660 x 4
name obs llim ulim
<fct> <int> <dbl> <dbl>
1 A 22 -23 28
2 A -5 -23 28
3 A 2 -23 28
4 A 9 -23 28
5 A -1 -23 28
6 A -21 -23 28
7 A 13 -23 28
8 A 0 -23 28
9 A 8 -23 28
10 A -11 -23 28
# … with 650 more rows
接下来,我计算每组可观察值的直方图。只要我使用 hist()
.
的默认参数,这就可以正常工作
tib1 %>% group_by(name) %>%
summarise (counts = hist(obs, plot = FALSE)$counts)
`summarise()` has grouped output by 'name'. You can override using the `.groups` argument.
# A tibble: 22 x 2
# Groups: name [2]
name counts
<fct> <int>
1 A 26
2 A 44
3 A 39
4 A 32
5 A 42
6 A 34
7 A 44
8 A 41
9 A 39
10 A 37
# … with 12 more rows
现在,我想使用存储在 tibble 中的其他组特定参数来调整这些直方图,例如llim 和 ulim。然而,这似乎不起作用:
tib1 %>% group_by(name) %>%
summarise (counts = hist (obs,
breaks = seq (llim,
ulim,
by = 1
),
plot = FALSE
)$counts
)
Error: Problem with `summarise()` input `counts`.
✖ 'from' must be of length 1
ℹ Input `counts` is `hist(obs, breaks = seq(llim, ulim, by = 1), plot = FALSE)$counts`.
ℹ The error occurred in group 1: name = "A".
Run `rlang::last_error()` to see where the error occurred.
有没有办法将列 llim
和 ulim
的值传递给 hist()
函数?还是有其他问题?错误信息有点含糊...
非常感谢您的帮助!
这给出了 obs
按组 name
的直方图
library(ggplot2)
ggplot(tib1, aes(x = obs)) +
geom_histogram(aes(color = name, fill = name),
position = "identity", bins = 30, alpha = 0.4) +
scale_color_manual(values = c("blue", "red")) +
scale_fill_manual(values = c("blue", "red"))
将 llim
和 ulim
的长度减少到 1(例如使用 max()
或 min()
)就可以了:
tib1 %>% group_by(name, llim, ulim) %>%
summarise (counts = hist (obs,
breaks = seq (max(llim),
max(ulim),
by = 1
),
plot = FALSE
)$counts
)
# A tibble: 105 x 4
# Groups: name, llim, ulim [2]
name llim ulim counts
<fct> <dbl> <dbl> <int>
1 A -23 28 9
2 A -23 28 9
3 A -23 28 8
4 A -23 28 7
5 A -23 28 5
6 A -23 28 8
7 A -23 28 14
8 A -23 28 10
9 A -23 28 9
10 A -23 28 9
# … with 95 more rows
所以错误信息最终是有道理的...
请考虑以下最小示例:我将两个实验 A 和 B 的观察结果合并为 dplyr tibble。 llim
和 ulim
定义每组中可观察值的下限和上限。
library(dplyr)
name <- factor (c (rep('A', 400),
rep('B', 260)
)
)
obs <- c (sample(-23:28, 400, replace = TRUE),
sample(-15:39, 260, replace = TRUE)
)
llim <- c (rep(-23, 400),
rep(-15, 260)
)
ulim <- c (rep(28, 400),
rep(39, 260)
)
tib1 <- tibble (name, obs, llim, ulim)
tib1
# A tibble: 660 x 4
name obs llim ulim
<fct> <int> <dbl> <dbl>
1 A 22 -23 28
2 A -5 -23 28
3 A 2 -23 28
4 A 9 -23 28
5 A -1 -23 28
6 A -21 -23 28
7 A 13 -23 28
8 A 0 -23 28
9 A 8 -23 28
10 A -11 -23 28
# … with 650 more rows
接下来,我计算每组可观察值的直方图。只要我使用 hist()
.
tib1 %>% group_by(name) %>%
summarise (counts = hist(obs, plot = FALSE)$counts)
`summarise()` has grouped output by 'name'. You can override using the `.groups` argument.
# A tibble: 22 x 2
# Groups: name [2]
name counts
<fct> <int>
1 A 26
2 A 44
3 A 39
4 A 32
5 A 42
6 A 34
7 A 44
8 A 41
9 A 39
10 A 37
# … with 12 more rows
现在,我想使用存储在 tibble 中的其他组特定参数来调整这些直方图,例如llim 和 ulim。然而,这似乎不起作用:
tib1 %>% group_by(name) %>%
summarise (counts = hist (obs,
breaks = seq (llim,
ulim,
by = 1
),
plot = FALSE
)$counts
)
Error: Problem with `summarise()` input `counts`.
✖ 'from' must be of length 1
ℹ Input `counts` is `hist(obs, breaks = seq(llim, ulim, by = 1), plot = FALSE)$counts`.
ℹ The error occurred in group 1: name = "A".
Run `rlang::last_error()` to see where the error occurred.
有没有办法将列 llim
和 ulim
的值传递给 hist()
函数?还是有其他问题?错误信息有点含糊...
非常感谢您的帮助!
这给出了 obs
按组 name
library(ggplot2)
ggplot(tib1, aes(x = obs)) +
geom_histogram(aes(color = name, fill = name),
position = "identity", bins = 30, alpha = 0.4) +
scale_color_manual(values = c("blue", "red")) +
scale_fill_manual(values = c("blue", "red"))
将 llim
和 ulim
的长度减少到 1(例如使用 max()
或 min()
)就可以了:
tib1 %>% group_by(name, llim, ulim) %>%
summarise (counts = hist (obs,
breaks = seq (max(llim),
max(ulim),
by = 1
),
plot = FALSE
)$counts
)
# A tibble: 105 x 4
# Groups: name, llim, ulim [2]
name llim ulim counts
<fct> <dbl> <dbl> <int>
1 A -23 28 9
2 A -23 28 9
3 A -23 28 8
4 A -23 28 7
5 A -23 28 5
6 A -23 28 8
7 A -23 28 14
8 A -23 28 10
9 A -23 28 9
10 A -23 28 9
# … with 95 more rows
所以错误信息最终是有道理的...