group_by 并计算 R 中每列中的元素数

group_by and count number of elements in each column in R

我有一个数据 table 如下所示:

city         year    t_20   t_25 
Seattle      2019    82      91  
Seattle      2018     0      103   
NYC          2010    78       8 
DC           2011    71       0  
DC           2011     0       0    
DC           2018    60       0

我想按 cityyear 对它们进行分组 并计算每组中零的数量。

我该怎么做?通过 summarize_at?

df %>% group_by(city, year) %>% summarise_at( WHAT GOES HERE , vars(t_20:t_25))

summarize_at 的第一个参数应该是什么?

或者其他方式? tally?

一个选项是在 summariseing

之前将形状从宽改成长
library(tidyverse)
df %>%
    gather(k, v, -city, -year) %>%
    group_by(city, year) %>%
    summarise(n_0 = sum(v == 0)) 
#    # A tibble: 5 x 3
## Groups:   city [?]
#  city     year   n_0
#  <fct>   <int> <int>
#1 DC       2011     3
#2 DC       2018     1
#3 NYC      2010     0
#4 Seattle  2018     1
#5 Seattle  2019     0

您可以单独对每一列进行汇总

df %>%
    group_by(city, year) %>%
    summarise_all(funs(sum(. == 0)))
## A tibble: 5 x 4
## Groups:   city [?]
#  city     year  t_20  t_25
#  <fct>   <int> <int> <int>
#1 DC       2011     1     2
#2 DC       2018     0     1
#3 NYC      2010     0     0
#4 Seattle  2018     1     0
#5 Seattle  2019     0     0

示例数据

df <- read.table(text =
    "city         year    t_20   t_25
Seattle      2019    82      91
Seattle      2018     0      103
NYC          2010    78       8
DC           2011    71       0
DC           2011     0       0
DC           2018    60       0", header = T)

一个简单的分组操作很适合使用 SQL 来表达。对于那些 SQL 倾向的人,我们也可以尝试使用 sqldf 库来解决这个问题:

library(sqldf)

sql <- "SELECT city, COUNT(CASE WHEN t_20 = 0 THEN 1 END) AS t_20_cnt,
            COUNT(CASE WHEN t_25 = 0 THEN 1 END) AS t_25_cnt
        FROM df
        GROUP BY city"

output <- sqldf(sql)