如何在 dplyr 中按类别和子组计数进行汇总
How to do summarize group by category and count of a subgroup in dplyr
使用 titanic 内置数据集,我目前可以计算变量 Class 中的观察次数。如何使用 Survive = 'Yes' 和 Survive = 'No'.
创建一个新列
> as.data.frame(Titanic) %>%
mutate_if(is.character, as.factor) %>%
group_by(Class) %>%
summarise("Number of Observations" = n() )
# A tibble: 4 × 2
Class `Number of Observations`
<fct> <int>
1 1st 8
2 2nd 8
3 3rd 8
4 Crew 8
我希望能得到这样的东西
# A tibble: 4 × 2
Class `Number of Observations` Survived.Yes Survived.No
<fct> <int>
1 1st 8 4 4
2 2nd 8 4 4
3 3rd 8 4 4
4 Crew 8 4 4
我试过将 Survived 放入组中,但它输出到一个单独的行中。
as.data.frame(Titanic) %>%
mutate_if(is.character, as.factor) %>%
group_by(Class, Survived) %>%
summarise("Number of Observations" = n() )
# A tibble: 8 × 3
# Groups: Class [4]
Class Survived `Number of Observations`
<fct> <fct> <int>
1 1st No 4
2 1st Yes 4
3 2nd No 4
4 2nd Yes 4
5 3rd No 4
6 3rd Yes 4
7 Crew No 4
8 Crew Yes 4
如有任何建议,我们将不胜感激。谢谢
您可以使用 sum(Survived == "Yes")
来计算每个组中“是”的数量。
as.data.frame(Titanic) %>%
group_by(Class) %>%
summarise(
"Number of Observations" = n(),
across(Survived, list(Yes = ~ sum(. == "Yes"),
No = ~ sum(. == "No"))))
# # A tibble: 4 x 4
# Class `Number of Observations` Survived_Yes Survived_No
# <fct> <int> <int> <int>
# 1 1st 8 4 4
# 2 2nd 8 4 4
# 3 3rd 8 4 4
# 4 Crew 8 4 4
您还可以使用 tidyr
中的 pivot_wider()
:
library(tidyr)
as.data.frame(Titanic) %>%
add_count(Class, name = "Number of Observations") %>%
pivot_wider(c(Class, last_col()),
names_from = Survived, names_prefix = "Survived_",
values_from = Survived, values_fn = length)
# # A tibble: 4 x 4
# Class `Number of Observations` Survived_No Survived_Yes
# <fct> <int> <int> <int>
# 1 1st 8 4 4
# 2 2nd 8 4 4
# 3 3rd 8 4 4
# 4 Crew 8 4 4
您甚至不需要附加其他包。
addmargins(xtabs(~ Class + Survived, Titanic), 2)
# Survived
# Class No Yes Sum
# 1st 4 4 8
# 2nd 4 4 8
# 3rd 4 4 8
# Crew 4 4 8
使用 titanic 内置数据集,我目前可以计算变量 Class 中的观察次数。如何使用 Survive = 'Yes' 和 Survive = 'No'.
创建一个新列> as.data.frame(Titanic) %>%
mutate_if(is.character, as.factor) %>%
group_by(Class) %>%
summarise("Number of Observations" = n() )
# A tibble: 4 × 2
Class `Number of Observations`
<fct> <int>
1 1st 8
2 2nd 8
3 3rd 8
4 Crew 8
我希望能得到这样的东西
# A tibble: 4 × 2
Class `Number of Observations` Survived.Yes Survived.No
<fct> <int>
1 1st 8 4 4
2 2nd 8 4 4
3 3rd 8 4 4
4 Crew 8 4 4
我试过将 Survived 放入组中,但它输出到一个单独的行中。
as.data.frame(Titanic) %>%
mutate_if(is.character, as.factor) %>%
group_by(Class, Survived) %>%
summarise("Number of Observations" = n() )
# A tibble: 8 × 3
# Groups: Class [4]
Class Survived `Number of Observations`
<fct> <fct> <int>
1 1st No 4
2 1st Yes 4
3 2nd No 4
4 2nd Yes 4
5 3rd No 4
6 3rd Yes 4
7 Crew No 4
8 Crew Yes 4
如有任何建议,我们将不胜感激。谢谢
您可以使用 sum(Survived == "Yes")
来计算每个组中“是”的数量。
as.data.frame(Titanic) %>%
group_by(Class) %>%
summarise(
"Number of Observations" = n(),
across(Survived, list(Yes = ~ sum(. == "Yes"),
No = ~ sum(. == "No"))))
# # A tibble: 4 x 4
# Class `Number of Observations` Survived_Yes Survived_No
# <fct> <int> <int> <int>
# 1 1st 8 4 4
# 2 2nd 8 4 4
# 3 3rd 8 4 4
# 4 Crew 8 4 4
您还可以使用 tidyr
中的 pivot_wider()
:
library(tidyr)
as.data.frame(Titanic) %>%
add_count(Class, name = "Number of Observations") %>%
pivot_wider(c(Class, last_col()),
names_from = Survived, names_prefix = "Survived_",
values_from = Survived, values_fn = length)
# # A tibble: 4 x 4
# Class `Number of Observations` Survived_No Survived_Yes
# <fct> <int> <int> <int>
# 1 1st 8 4 4
# 2 2nd 8 4 4
# 3 3rd 8 4 4
# 4 Crew 8 4 4
您甚至不需要附加其他包。
addmargins(xtabs(~ Class + Survived, Titanic), 2)
# Survived
# Class No Yes Sum
# 1st 4 4 8
# 2nd 4 4 8
# 3rd 4 4 8
# Crew 4 4 8