使用 R 按组对数据框中每个变量的列求和

Question

我有一个data.frame如下

    table=data.frame(ID=c(rep("Be_01",8),rep("Ce_02",5)),Orig=c("Car","Bus","Truck","Car","Bus","Car","Bike","Truck","Car","Truck","Bus","Bike","Bike"),Orig_counts=c(5,9,8,10,14,4,8,6,10,3,9,10,6), Replace=c("Bike","Truck","Bus","Truck","Truck","Bike","Car","Bus","Bike","Bike","Truck","Car","Car"),Replace_Count=c(9,4,2,7,10,11,12,6,7,5,9,4,2))
>table
         ID  Orig Orig_counts Replace Replace_Count
      Be_01   Car           5    Bike             9
      Be_01   Bus           9   Truck             4
      Be_01 Truck           8     Bus             2
      Be_01   Car          10   Truck             7
      Be_01   Bus          14   Truck            10
      Be_01   Car           4    Bike            11
      Be_01  Bike           8     Car            12
      Be_01 Truck           6     Bus             6
      Ce_02   Car          10    Bike             7
      Ce_02 Truck           3    Bike             5
      Ce_02   Bus           9   Truck             9
      Ce_02  Bike          10     Car             4
      Ce_02  Bike           6     Car             2

我想对所有在 Orig 列中具有条目 "Car" 且在 Replace 列中具有条目 "Bike" 的行求和 Replace_counts，反之亦然。我想要输出如下

ID    Bike_and_Cars Cars_and_Bike
Be_01        12          20  
Ce_02        6           7

是否可以通过 R 中的聚合函数实现此目的

Answer 1

您可以使用拆分-应用-组合来完成此操作。这是基于 R 的解决方案，使用 split 函数按 ID 拆分数据框，使用 lapply 函数汇总数据的每个特定于 ID 的子集，以及 do.call与 rbind 结合每个 ID 的汇总数据。

do.call(rbind, lapply(split(dat, dat$ID), function(x) {
  data.frame(ID=x$ID[1],
             Bike_and_Cars=sum(x$Replace_Count[x$Orig == "Bike" & x$Replace=="Car"]),
             Cars_and_Bike=sum(x$Replace_Count[x$Orig == "Car" & x$Replace == "Bike"]))
}))
#          ID Bike_and_Cars Cars_and_Bike
# Be_01 Be_01            12            20
# Ce_02 Ce_02             6             7

Answer 2

冒着无法回答您提出的确切问题的风险，采用更通用的方法可能会更好地为您服务。

> aggregate(Replace_Count ~ ID + Orig + Replace, data=table, sum)
     ID  Orig Replace Replace_Count
1 Be_01   Car    Bike            20
2 Ce_02   Car    Bike             7
3 Ce_02 Truck    Bike             5
4 Be_01 Truck     Bus             8
5 Be_01  Bike     Car            12
6 Ce_02  Bike     Car             6
7 Be_01   Bus   Truck            14
8 Ce_02   Bus   Truck             9
9 Be_01   Car   Truck             7

从这里可以很容易地提取您最感兴趣的数据子集。一种方法是创建一个组合列，比如

table$Move <- with(table, paste0(Orig,"_and_",Replace))

然后用 tidyr 展开数据（你也可以使用 reshape2）

spread(aggregate(Replace_Count ~ ID + Move, data=table, sum), Move, Replace_Count)
     ID Bike_and_Car Bus_and_Truck Car_and_Bike Car_and_Truck Truck_and_Bike
1 Be_01           12            14           20             7             NA
2 Ce_02            6             9            7            NA              5

使用 R 按组对数据框中每个变量的列求和

Summing the columns for every variable in data frame by groups using R

aggregate

r