dplyr::group_by() 有多个变量但不是交集
dplyr::group_by() with multiple variables but NOT intersection
当您 group_by
多个变量时,dplyr
有助于找到这些组的交集。
例如,
mtcars %>%
group_by(cyl, am) %>%
summarise(mean(disp))
产量
Source: local data frame [6 x 3]
Groups: cyl [?]
cyl am `mean(disp)`
<dbl> <dbl> <dbl>
1 4 0 135.8667
2 4 1 93.6125
3 6 0 204.5500
4 6 1 155.0000
5 8 0 357.6167
6 8 1 326.0000
我的问题是,有没有办法提供多个变量,但总结勉强?我希望输出就像您手动执行此操作时所得到的那样,逐个变量。
df_1 <-
mtcars %>%
group_by(cyl) %>%
summarise(est = mean(disp)) %>%
transmute(group = paste0("cyl_", cyl), est)
df_2 <-
mtcars %>%
group_by(am) %>%
summarise(est = mean(disp)) %>%
transmute(group = paste0("am_", am), est)
bind_rows(df_1, df_2)
以上代码产生
# A tibble: 5 × 2
group est
<chr> <dbl>
1 cyl_4 105.1364
2 cyl_6 183.3143
3 cyl_8 353.1000
4 am_0 290.3789
5 am_1 143.5308
理想情况下,语法类似于
mtcars %>%
group_by(cyl, am, intersection = FALSE) %>%
summarise(est = mean(disp))
tidyverse
中有这样的东西吗?
(p.s., 我知道上面 table 中的 group
变量不整洁,因为它包含两个变量,但我保证我的目的是整洁,好吗?:))
我猜你要找的是 tidyr
包...
gather
首先复制数据集,以便每个分组依据的因素都有 n 行; mutate
然后创建分组变量。
library(dplyr)
library(tidyr)
mtcars %>%
gather(col, value, cyl, am) %>%
mutate(group = paste(col, value, sep = "_")) %>%
group_by(group) %>%
summarise(est = mean(disp))
一个purrr
备选方案:
library(tidyverse)
map(c('cyl', 'am'),
~ mtcars %>%
group_by_(.x) %>%
summarise(est = mean(disp)) %>%
transmute_(group = lazyeval::interp(~paste0(.x, '_', y), y = as.name(.x)),
~est)) %>%
bind_rows()
# A tibble: 5 × 2
group est
<chr> <dbl>
1 cyl_4 105.1364
2 cyl_6 183.3143
3 cyl_8 353.1000
4 am_0 290.3789
5 am_1 143.5308
plyr 打包更简单。
library(plyr)
mtcars %>%
ddply(c("cyl", "am"), .fun = function(x) {
mean(x$disp)
})
当您 group_by
多个变量时,dplyr
有助于找到这些组的交集。
例如,
mtcars %>%
group_by(cyl, am) %>%
summarise(mean(disp))
产量
Source: local data frame [6 x 3]
Groups: cyl [?]
cyl am `mean(disp)`
<dbl> <dbl> <dbl>
1 4 0 135.8667
2 4 1 93.6125
3 6 0 204.5500
4 6 1 155.0000
5 8 0 357.6167
6 8 1 326.0000
我的问题是,有没有办法提供多个变量,但总结勉强?我希望输出就像您手动执行此操作时所得到的那样,逐个变量。
df_1 <-
mtcars %>%
group_by(cyl) %>%
summarise(est = mean(disp)) %>%
transmute(group = paste0("cyl_", cyl), est)
df_2 <-
mtcars %>%
group_by(am) %>%
summarise(est = mean(disp)) %>%
transmute(group = paste0("am_", am), est)
bind_rows(df_1, df_2)
以上代码产生
# A tibble: 5 × 2
group est
<chr> <dbl>
1 cyl_4 105.1364
2 cyl_6 183.3143
3 cyl_8 353.1000
4 am_0 290.3789
5 am_1 143.5308
理想情况下,语法类似于
mtcars %>%
group_by(cyl, am, intersection = FALSE) %>%
summarise(est = mean(disp))
tidyverse
中有这样的东西吗?
(p.s., 我知道上面 table 中的 group
变量不整洁,因为它包含两个变量,但我保证我的目的是整洁,好吗?:))
我猜你要找的是 tidyr
包...
gather
首先复制数据集,以便每个分组依据的因素都有 n 行; mutate
然后创建分组变量。
library(dplyr)
library(tidyr)
mtcars %>%
gather(col, value, cyl, am) %>%
mutate(group = paste(col, value, sep = "_")) %>%
group_by(group) %>%
summarise(est = mean(disp))
一个purrr
备选方案:
library(tidyverse)
map(c('cyl', 'am'),
~ mtcars %>%
group_by_(.x) %>%
summarise(est = mean(disp)) %>%
transmute_(group = lazyeval::interp(~paste0(.x, '_', y), y = as.name(.x)),
~est)) %>%
bind_rows()
# A tibble: 5 × 2 group est <chr> <dbl> 1 cyl_4 105.1364 2 cyl_6 183.3143 3 cyl_8 353.1000 4 am_0 290.3789 5 am_1 143.5308
plyr 打包更简单。
library(plyr)
mtcars %>%
ddply(c("cyl", "am"), .fun = function(x) {
mean(x$disp)
})