如何在 dplyr 中按类别和子组计数进行汇总

Question

使用 titanic 内置数据集，我目前可以计算变量 Class 中的观察次数。如何使用 Survive = 'Yes' 和 Survive = 'No'.

创建一个新列

> as.data.frame(Titanic) %>% 
      mutate_if(is.character, as.factor) %>% 
      group_by(Class) %>%
      summarise("Number of Observations" = n() )

# A tibble: 4 × 2
  Class `Number of Observations`
  <fct>                    <int>
1 1st                          8
2 2nd                          8
3 3rd                          8
4 Crew                         8

我希望能得到这样的东西

# A tibble: 4 × 2
  Class `Number of Observations`   Survived.Yes   Survived.No
  <fct>                    <int>
1 1st                          8      4              4
2 2nd                          8      4              4
3 3rd                          8      4              4
4 Crew                         8      4              4

我试过将 Survived 放入组中，但它输出到一个单独的行中。

as.data.frame(Titanic) %>% 
  mutate_if(is.character, as.factor) %>% 
  group_by(Class, Survived) %>%
  summarise("Number of Observations" = n() )

# A tibble: 8 × 3
# Groups:   Class [4]
  Class Survived `Number of Observations`
  <fct> <fct>                       <int>
1 1st   No                              4
2 1st   Yes                             4
3 2nd   No                              4
4 2nd   Yes                             4
5 3rd   No                              4
6 3rd   Yes                             4
7 Crew  No                              4
8 Crew  Yes                             4

如有任何建议，我们将不胜感激。谢谢

Answer 1

您可以使用 sum(Survived == "Yes") 来计算每个组中“是”的数量。

as.data.frame(Titanic) %>% 
  group_by(Class) %>%
  summarise(
    "Number of Observations" = n(),
    across(Survived, list(Yes = ~ sum(. == "Yes"),
                          No  = ~ sum(. == "No"))))

# # A tibble: 4 x 4
#   Class `Number of Observations` Survived_Yes Survived_No
#   <fct>                    <int>        <int>       <int>
# 1 1st                          8            4           4
# 2 2nd                          8            4           4
# 3 3rd                          8            4           4
# 4 Crew                         8            4           4

您还可以使用 tidyr 中的 pivot_wider():

library(tidyr)

as.data.frame(Titanic) %>%
  add_count(Class, name = "Number of Observations") %>%
  pivot_wider(c(Class, last_col()),
              names_from = Survived, names_prefix = "Survived_",
              values_from = Survived, values_fn = length)

# # A tibble: 4 x 4
#   Class `Number of Observations` Survived_No Survived_Yes
#   <fct>                    <int>       <int>        <int>
# 1 1st                          8           4            4
# 2 2nd                          8           4            4
# 3 3rd                          8           4            4
# 4 Crew                         8           4            4

您甚至不需要附加其他包。

addmargins(xtabs(~ Class + Survived, Titanic), 2)

#       Survived
# Class  No Yes Sum
#   1st   4   4   8
#   2nd   4   4   8
#   3rd   4   4   8
#   Crew  4   4   8

如何在 dplyr 中按类别和子组计数进行汇总

How to do summarize group by category and count of a subgroup in dplyr

group-by

r

dataframe

dplyr

summarize