基于R中多个条件的列平均值

Mean of column based on multiple conditions in R

我有一个数据框

DF <- data.frame(y1=c("AG","AG","AI","AI","AG","AI"),
      y0=c(2,2,1,1,2,1),
      y3=c(1994,1996,1997,1999,1994,1994),y4=c("AA","FB","AA","EB","AA","EB"),
      mw3wuus=c(26,34,22,21,65,78),
      Country_true=c("Antigua and  Barbuda","Antigua and  Barbuda","Anguilla","Anguilla","Antigua and  Barbuda","Anguilla"))

 DF
  y1 y0   y3 y4 mw3wuus         Country_true
1 AG  2 1994 AA      26 Antigua and  Barbuda
2 AG  2 1996 FB      34 Antigua and  Barbuda
3 AI  1 1997 AA      22             Anguilla
4 AI  1 1999 EB      21             Anguilla
5 AG  2 1994 AA      65 Antigua and  Barbuda
6 AI  1 1994 EB      78             Anguilla

我正在尝试根据其他列相等的事实创建一个具有均值变量的新列

例如,在示例中,除第 5 行和第 1 行外,所有内容都必须相同,我需要计算 mw3wuus 的平均值,因为它们具有相同的 y1 值, y0, y3, 和 y4.

您可能想尝试一下 aggregate

例如:

aggregate(DF$mw3wuus, FUN=mean, 
          by=list(y1=DF$y1, y0=DF$y0, y3=DF$y3, y4=DF$y4))

会给你:

  y1 y0   y3 y4    x
1 AG  2 1994 AA 45.5
2 AI  1 1997 AA 22.0
3 AI  1 1994 EB 78.0
4 AI  1 1999 EB 21.0
5 AG  2 1996 FB 34.0

使用data.table

library(data.table)
setDT(DF)[, Mean := mean(mw3wuus), by = .(y1, y0, y3, y4)][]
#    y1 y0   y3 y4 mw3wuus         Country_true Mean
# 1: AG  2 1994 AA      26 Antigua and  Barbuda 45.5
# 2: AG  2 1996 FB      34 Antigua and  Barbuda 34.0
# 3: AI  1 1997 AA      22             Anguilla 22.0
# 4: AI  1 1999 EB      21             Anguilla 21.0
# 5: AG  2 1994 AA      65 Antigua and  Barbuda 45.5
# 6: AI  1 1994 EB      78             Anguilla 78.0

或使用 dplyr 包:

library(dplyr)
DF %>% group_by(y1,y0,y3,y4) %>% summarise (x = mean(mw3wuus))