R 中的条件聚合

Conditional aggregation in R

考虑以下矩阵:

d <- data.frame(c("a","a","a","a","b","b","b","b"),c("a1","a1","a2","a2","a1","a1","a2","a2"),"c","d",c(1:8))

我想聚合第 5 列中的值,所以我得到以下 data.frame:

d1 <- data.frame(c("a","a","b","b"),c("a1","a2","a1","a2"),"c","d",c(3,7,11,15))

也就是说,我想根据第 2 列中的名称聚合第 5 列中的值。因此,我想保留第 1、3 和 4 列中的名称(在本例中,第 3 列中的名称和 4 是一样的,但在我的例子中是不同的)。

我如何在 R 中做到这一点?

使用 tidyverse,您可以通过您的 id 变量对数据进行分组,然后在这些组内汇总:

library(tidyverse)

d %>%
    group_by(v1, v2) %>%
    summarize(v3 = first(v3),
              v4 = first(v4),
              v5 = sum(v5))

结果:

# A tibble: 4 x 5
# Groups:   v1 [2]
  v1    v2    v3    v4       v5
  <fct> <fct> <fct> <fct> <int>
1 a     a1    c     d         3
2 a     a2    c     d         7
3 b     a1    c     d        11
4 b     a2    c     d        15

调用 first() 只是为重复值的列任意获取单个值的一种方法。

使用data.table:

代码

require(data.table)
d[, .(unique(V3), unique(V4), sum(V5)), .(V1, V2)]

具体来说,语法遵循dt[i, j, by]i 声明 data.table 对象的行子集,j 声明要执行的 list (shorthand .) 操作这个子集,并且 by 分配变量的分组。在您的情况下,您希望 sum V3V1-V2 对。此外,我们将 unique() 应用于 V4V5 以防止重复行。

结果

   V1 V2 V1 V2 V3
1:  a a1  c  d  3
2:  a a2  c  d  7
3:  b a1  c  d 11
4:  b a2  c  d 15

数据

d = data.table(V1 = c("a","a","a","a","b","b","b","b"), 
                V2 = c("a1","a1","a2","a2","a1","a1","a2","a2"), 
                V3 = "c", 
                V4 = "d", 
                V5 = c(1:8))