group_by 并计算 R 中每列中的元素数
group_by and count number of elements in each column in R
我有一个数据 table 如下所示:
city year t_20 t_25
Seattle 2019 82 91
Seattle 2018 0 103
NYC 2010 78 8
DC 2011 71 0
DC 2011 0 0
DC 2018 60 0
我想按 city
和 year
对它们进行分组
并计算每组中零的数量。
我该怎么做?通过 summarize_at
?
df %>% group_by(city, year) %>% summarise_at( WHAT GOES HERE , vars(t_20:t_25))
summarize_at
的第一个参数应该是什么?
或者其他方式? tally
?
一个选项是在 summarise
ing
之前将形状从宽改成长
library(tidyverse)
df %>%
gather(k, v, -city, -year) %>%
group_by(city, year) %>%
summarise(n_0 = sum(v == 0))
# # A tibble: 5 x 3
## Groups: city [?]
# city year n_0
# <fct> <int> <int>
#1 DC 2011 3
#2 DC 2018 1
#3 NYC 2010 0
#4 Seattle 2018 1
#5 Seattle 2019 0
您可以单独对每一列进行汇总
df %>%
group_by(city, year) %>%
summarise_all(funs(sum(. == 0)))
## A tibble: 5 x 4
## Groups: city [?]
# city year t_20 t_25
# <fct> <int> <int> <int>
#1 DC 2011 1 2
#2 DC 2018 0 1
#3 NYC 2010 0 0
#4 Seattle 2018 1 0
#5 Seattle 2019 0 0
示例数据
df <- read.table(text =
"city year t_20 t_25
Seattle 2019 82 91
Seattle 2018 0 103
NYC 2010 78 8
DC 2011 71 0
DC 2011 0 0
DC 2018 60 0", header = T)
一个简单的分组操作很适合使用 SQL 来表达。对于那些 SQL 倾向的人,我们也可以尝试使用 sqldf
库来解决这个问题:
library(sqldf)
sql <- "SELECT city, COUNT(CASE WHEN t_20 = 0 THEN 1 END) AS t_20_cnt,
COUNT(CASE WHEN t_25 = 0 THEN 1 END) AS t_25_cnt
FROM df
GROUP BY city"
output <- sqldf(sql)
我有一个数据 table 如下所示:
city year t_20 t_25
Seattle 2019 82 91
Seattle 2018 0 103
NYC 2010 78 8
DC 2011 71 0
DC 2011 0 0
DC 2018 60 0
我想按 city
和 year
对它们进行分组
并计算每组中零的数量。
我该怎么做?通过 summarize_at
?
df %>% group_by(city, year) %>% summarise_at( WHAT GOES HERE , vars(t_20:t_25))
summarize_at
的第一个参数应该是什么?
或者其他方式? tally
?
一个选项是在 summarise
ing
library(tidyverse)
df %>%
gather(k, v, -city, -year) %>%
group_by(city, year) %>%
summarise(n_0 = sum(v == 0))
# # A tibble: 5 x 3
## Groups: city [?]
# city year n_0
# <fct> <int> <int>
#1 DC 2011 3
#2 DC 2018 1
#3 NYC 2010 0
#4 Seattle 2018 1
#5 Seattle 2019 0
您可以单独对每一列进行汇总
df %>%
group_by(city, year) %>%
summarise_all(funs(sum(. == 0)))
## A tibble: 5 x 4
## Groups: city [?]
# city year t_20 t_25
# <fct> <int> <int> <int>
#1 DC 2011 1 2
#2 DC 2018 0 1
#3 NYC 2010 0 0
#4 Seattle 2018 1 0
#5 Seattle 2019 0 0
示例数据
df <- read.table(text =
"city year t_20 t_25
Seattle 2019 82 91
Seattle 2018 0 103
NYC 2010 78 8
DC 2011 71 0
DC 2011 0 0
DC 2018 60 0", header = T)
一个简单的分组操作很适合使用 SQL 来表达。对于那些 SQL 倾向的人,我们也可以尝试使用 sqldf
库来解决这个问题:
library(sqldf)
sql <- "SELECT city, COUNT(CASE WHEN t_20 = 0 THEN 1 END) AS t_20_cnt,
COUNT(CASE WHEN t_25 = 0 THEN 1 END) AS t_25_cnt
FROM df
GROUP BY city"
output <- sqldf(sql)