计算 data.frame 中的总和(长格式)

calculate the sum in a data.frame (long format)

我想计算此 data.frame 2005 年、2006 年、2007 年和类别 a、b、c 的总和。

year <- c(2005,2005,2005,2006,2006,2006,2007,2007,2007)
category <- c("a","a","a","b","b","b","c","c","c")
value <- c(3,6,8,9,7,4,5,8,9)
df <- data.frame(year, category,value, stringsAsFactors = FALSE)

table 应该是这样的:

year category value
2005 a 1
2005 a 1
2005 a 1
2006 b 2
2006 b 2
2006 b 2
2007 c 3
2007 c 3
2007 c 3
2006 a 3
2007 b 6
2008 c 9

知道如何实现吗? add_row 或者 cbind?

如何使用 dplyr 包:

df %>% 
  group_by(year, category) %>% 
  summarise(sum = sum(value))
# # A tibble: 3 × 3
# # Groups:   year [3]
#    year category   sum
#   <dbl> <chr>    <dbl>
# 1  2005 a           17
# 2  2006 b           20
# 3  2007 c           22

如果您宁愿添加一个作为总和的列而不是折叠它,请将 summarise() 替换为 mutate()

df %>% 
  group_by(year, category) %>% 
  mutate(sum = sum(value))
# # A tibble: 9 × 4
# # Groups:   year, category [3]
#    year category value   sum
#   <dbl> <chr>    <dbl> <dbl>
# 1  2005 a            3    17
# 2  2005 a            6    17
# 3  2005 a            8    17
# 4  2006 b            9    20
# 5  2006 b            7    20
# 6  2006 b            4    20
# 7  2007 c            5    22
# 8  2007 c            8    22
# 9  2007 c            9    22

使用aggregate

基础R解决方案
rbind( df, aggregate( value ~ year + category, df, sum ) )

   year category value
1  2005        a     3
2  2005        a     6
3  2005        a     8
4  2006        b     9
5  2006        b     7
6  2006        b     4
7  2007        c     5
8  2007        c     8
9  2007        c     9
10 2005        a    17
11 2006        b    20
12 2007        c    22