使用 dplyr 和 tidyr 计算小计
Calculate subtotals with dplyr and tidyr
expand.grid(country = c('Sweden','Norway', 'Denmark','Finland'),
sport = c('curling','crosscountry','downhill')) %>%
mutate(medals = sample(0:3, 12, TRUE)) ->
data
使用 reshape2 的 dcast 在一行中实现这一点。为边距使用自定义名称需要额外的步骤。
library(reshape2)
data %>%
dcast(country ~ sport, margins = TRUE, sum) %>%
# optional renaming of the margins `(all)`
rename(Total = `(all)`) %>%
mutate(country = ifelse(country == "(all)", "Total", country))
我的 dplyr + tidyr 方法很冗长。使用 tidyr 和 dplyr 编写此代码的最佳(紧凑且可读)方式是什么。
library(dplyr)
library(tidyr)
data %>%
group_by(sport) %>%
summarise(medals = sum(medals)) %>%
mutate(country = 'Total') ->
sport_totals
data %>%
group_by(country) %>%
summarise(medals = sum(medals)) %>%
mutate(sport = 'Total') ->
country_totals
data %>%
summarise(medals = sum(medals)) %>%
mutate(sport = 'Total',
country = 'Total') ->
totals
data %>%
bind_rows(country_totals, sport_totals, totals) %>%
spread(sport, medals)
我不知道这是否是最好的(紧凑且可读),但它有效 ;)
data %>%
spread(sport, medals) %>%
mutate(Total = rowSums(.[2:4])) %>%
rbind(., data.frame(country="Total", t(colSums(.[2:5]))))
country curling crosscountry downhill Total
1 Sweden 0 2 0 2
2 Norway 1 1 0 2
3 Denmark 2 2 1 5
4 Finland 3 0 2 5
5 Total 6 5 3 14
expand.grid(country = c('Sweden','Norway', 'Denmark','Finland'),
sport = c('curling','crosscountry','downhill')) %>%
mutate(medals = sample(0:3, 12, TRUE)) ->
data
使用 reshape2 的 dcast 在一行中实现这一点。为边距使用自定义名称需要额外的步骤。
library(reshape2)
data %>%
dcast(country ~ sport, margins = TRUE, sum) %>%
# optional renaming of the margins `(all)`
rename(Total = `(all)`) %>%
mutate(country = ifelse(country == "(all)", "Total", country))
我的 dplyr + tidyr 方法很冗长。使用 tidyr 和 dplyr 编写此代码的最佳(紧凑且可读)方式是什么。
library(dplyr)
library(tidyr)
data %>%
group_by(sport) %>%
summarise(medals = sum(medals)) %>%
mutate(country = 'Total') ->
sport_totals
data %>%
group_by(country) %>%
summarise(medals = sum(medals)) %>%
mutate(sport = 'Total') ->
country_totals
data %>%
summarise(medals = sum(medals)) %>%
mutate(sport = 'Total',
country = 'Total') ->
totals
data %>%
bind_rows(country_totals, sport_totals, totals) %>%
spread(sport, medals)
我不知道这是否是最好的(紧凑且可读),但它有效 ;)
data %>%
spread(sport, medals) %>%
mutate(Total = rowSums(.[2:4])) %>%
rbind(., data.frame(country="Total", t(colSums(.[2:5]))))
country curling crosscountry downhill Total
1 Sweden 0 2 0 2
2 Norway 1 1 0 2
3 Denmark 2 2 1 5
4 Finland 3 0 2 5
5 Total 6 5 3 14