R:如何对按因子分组的列求和?

R: how to sum columns grouped by a factor?

如果我有这样的table:

user,v1,v2,v3
a,1,0,0
a,1,0,1
b,1,0,0
b,2,0,3
c,1,1,1

如何把它变成这个?

user,v1,v2,v3
a,2,0,1
b,3,0,3
c,1,1,1

您可以使用 dplyr 来实现:

library(dplyr)
df = data.frame(
  user = c("a", "a", "b", "b", "c"),
  v1   = c(1, 1, 1, 2, 1),
  v2   = c(0, 0, 0, 0, 1),
  v3   = c(0, 1, 0, 3, 1))

group_by(df, user) %>% 
summarize(v1_sum = sum(v1),
          v2_sum = sum(v2),
          v3_sum = sum(v3))      

如果您不熟悉 %>% 表示法,它基本上就像来自 bash 的管道。它从 group_by() 获取输出并将其放入 summarize()。同样的事情可以这样完成:

by_user = group_by(df, user)
df_summarized = summarize(by_user, 
                          v1_sum = sum(v1),
                          v2_sum = sum(v2),
                          v3_sum = sum(v3))  

在基础 R 中,

D <- matrix(c(1, 0, 0,
              1, 0, 1,
              1, 0, 0,
              2, 0, 3,
              1, 1, 1),
            ncol=3, byrow=TRUE, dimnames=list(1:5, c("v1", "v2", "v3")))
D <- data.frame(user=c("a", "a", "b", "b", "c"), D)
aggregate(. ~ user, D, sum)

Returns

> aggregate(. ~ user, D, sum)
  user v1 v2 v3
1    a  2  0  1
2    b  3  0  3
3    c  1  1  1