如何在单个命令中组合两个不同的 dplyr 摘要

How to combine two different dplyr summaries in a single command

我正在尝试创建一个分组摘要,报告每组中的记录数,然后还显示一系列变量的均值。

我只能想出如何将其作为两个单独的摘要来完成,然后将它们结合在一起。这工作正常,但我想知道是否有更优雅的方法来做到这一点?

dailyn<-daily %>% # this summarises n
  group_by(type) %>%
  summarise(n=n()) %>%

dailymeans <- daily %>% # this summarises the means
  group_by(type) %>%
  summarise_at(vars(starts_with("d.")),funs(mean(., na.rm = TRUE))) %>%

dailysummary<-inner_join(dailyn,dailymeans) #this joins the two parts together

我正在处理的数据是这样的数据框:

daily<-data.frame(type=c("A","A","B","C","C","C"),
                  d.happy=c(1,5,3,7,2,4),
                  d.sad=c(5,3,6,3,1,2))

类似,你可以试试:

daily %>% 
  group_by(type) %>% 
  mutate(n = n()) %>% 
  mutate_at(vars(starts_with("d.")),funs(mean(., na.rm = TRUE))) %>%
  unique

给出:

Source: local data frame [3 x 4]
Groups: type [3]

    type  d.happy d.sad     n
  <fctr>    <dbl> <dbl> <int>
1      A 3.000000     4     2
2      B 3.000000     6     1
3      C 4.333333     2     3

您可以通过分组,使用 mutate 而不是 summarize,然后使用 slice() 保留每种类型的第一行,在一次调用中完成此操作:

daily %>% group_by(type) %>% 
  mutate(n = n()) %>% 
  mutate_at(vars(starts_with("d.")),funs(mean(., na.rm = TRUE))) %>% 
  slice(1L)

编辑:在这个修改后的示例中,它的工作原理可能会更清楚

daily_summary <- daily %>% group_by(type) %>% 
  mutate(n = n()) %>% 
  mutate_at(vars(starts_with("d.")),funs("mean" = mean(., na.rm = TRUE)))

daily_summary
# Source: local data frame [6 x 6]
# Groups: type [3]
# 
# # A tibble: 6 x 6
#    type d.happy d.sad     n d.happy_mean d.sad_mean
#  <fctr>   <dbl> <dbl> <int>        <dbl>      <dbl>
#1      A       1     5     2     3.000000          4
#2      A       5     3     2     3.000000          4
#3      B       3     6     1     3.000000          6
#4      C       7     3     3     4.333333          2
#5      C       2     1     3     4.333333          2
#6      C       4     2     3     4.333333          2

daily_summary %>% 
  slice(1L)

# Source: local data frame [3 x 6]
# Groups: type [3]
# 
# # A tibble: 3 x 6
#    type d.happy d.sad     n d.happy_mean d.sad_mean
#  <fctr>   <dbl> <dbl> <int>        <dbl>      <dbl>
#1      A       1     5     2     3.000000          4
#2      B       3     6     1     3.000000          6
#3      C       7     3     3     4.333333          2