汇总必须分组的多个列 tidyverse
Summarise multiple columns that have to be grouped tidyverse
我有一个数据框,其中包含如下所示的数据:
df <- data.frame(
group1 = c("High","High","High","Low","Low","Low"),
group2 = c("male","female","male","female","male","female"),
one = c("yes","yes","yes","yes","no","no"),
two = c("no","yes","no","yes","yes","yes"),
three = c("yes","no","no","no","yes","yes")
)
我想在变量 one
、two
和 three
中总结 yes/no 的计数,通常我会用 df %>% group_by(group1,group2,one) %>% summarise(n())
来做。有什么方法可以汇总所有三列,然后将它们全部绑定到一个输出 df 中,而无需在每一列上手动执行代码?我试过使用 for 循环,但我无法让 group_by()
识别我给它作为输入的 colname
获取长格式数据 count
:
library(dplyr)
library(tidyr)
df %>% pivot_longer(cols = one:three) %>% count(group1, group2, value)
# group1 group2 value n
# <chr> <chr> <chr> <int>
#1 High female no 1
#2 High female yes 2
#3 High male no 3
#4 High male yes 3
#5 Low female no 2
#6 Low female yes 4
#7 Low male no 1
#8 Low male yes 2
这可能只在 dplyr
中完成(无需使用 tidyr::pivot_*
),尽管输出格式略有不同。 (这个即使没有 rowwise
也能正常工作,虽然我不知道它的确切原因)
df <- data.frame(
group1 = c("High","High","High","Low","Low","Low"),
group2 = c("male","female","male","female","male","female"),
one = c("yes","yes","yes","yes","no","no"),
two = c("no","yes","no","yes","yes","yes"),
three = c("yes","no","no","no","yes","yes")
)
library(dplyr)
df %>%
group_by(group1, group2) %>%
summarise(yes_count = sum(c_across(everything()) == 'yes'),
no_count = sum(c_across(one:three) == 'no'), .groups = 'drop')
#> # A tibble: 4 x 4
#> group1 group2 yes_count no_count
#> <chr> <chr> <int> <int>
#> 1 High female 2 1
#> 2 High male 3 3
#> 3 Low female 4 2
#> 4 Low male 2 1
由 reprex package (v2.0.0)
于 2021-05-12 创建
使用data.table
library(data.table)
melt(setDT(df), id.var = c('group1', 'group2'))[, .(n = .N),
.(group1, group2, value)]
-输出
group1 group2 value n
1: High male yes 3
2: High female yes 2
3: Low female yes 4
4: Low male no 1
5: Low female no 2
6: High male no 3
7: Low male yes 2
8: High female no 1
有了base R
,我们可以使用by
和table
by(df[3:5], df[1:2], function(x) table(unlist(x)))
我有一个数据框,其中包含如下所示的数据:
df <- data.frame(
group1 = c("High","High","High","Low","Low","Low"),
group2 = c("male","female","male","female","male","female"),
one = c("yes","yes","yes","yes","no","no"),
two = c("no","yes","no","yes","yes","yes"),
three = c("yes","no","no","no","yes","yes")
)
我想在变量 one
、two
和 three
中总结 yes/no 的计数,通常我会用 df %>% group_by(group1,group2,one) %>% summarise(n())
来做。有什么方法可以汇总所有三列,然后将它们全部绑定到一个输出 df 中,而无需在每一列上手动执行代码?我试过使用 for 循环,但我无法让 group_by()
识别我给它作为输入的 colname
获取长格式数据 count
:
library(dplyr)
library(tidyr)
df %>% pivot_longer(cols = one:three) %>% count(group1, group2, value)
# group1 group2 value n
# <chr> <chr> <chr> <int>
#1 High female no 1
#2 High female yes 2
#3 High male no 3
#4 High male yes 3
#5 Low female no 2
#6 Low female yes 4
#7 Low male no 1
#8 Low male yes 2
这可能只在 dplyr
中完成(无需使用 tidyr::pivot_*
),尽管输出格式略有不同。 (这个即使没有 rowwise
也能正常工作,虽然我不知道它的确切原因)
df <- data.frame(
group1 = c("High","High","High","Low","Low","Low"),
group2 = c("male","female","male","female","male","female"),
one = c("yes","yes","yes","yes","no","no"),
two = c("no","yes","no","yes","yes","yes"),
three = c("yes","no","no","no","yes","yes")
)
library(dplyr)
df %>%
group_by(group1, group2) %>%
summarise(yes_count = sum(c_across(everything()) == 'yes'),
no_count = sum(c_across(one:three) == 'no'), .groups = 'drop')
#> # A tibble: 4 x 4
#> group1 group2 yes_count no_count
#> <chr> <chr> <int> <int>
#> 1 High female 2 1
#> 2 High male 3 3
#> 3 Low female 4 2
#> 4 Low male 2 1
由 reprex package (v2.0.0)
于 2021-05-12 创建使用data.table
library(data.table)
melt(setDT(df), id.var = c('group1', 'group2'))[, .(n = .N),
.(group1, group2, value)]
-输出
group1 group2 value n
1: High male yes 3
2: High female yes 2
3: Low female yes 4
4: Low male no 1
5: Low female no 2
6: High male no 3
7: Low male yes 2
8: High female no 1
有了base R
,我们可以使用by
和table
by(df[3:5], df[1:2], function(x) table(unlist(x)))