基于R中多个条件的列平均值
Mean of column based on multiple conditions in R
我有一个数据框
DF <- data.frame(y1=c("AG","AG","AI","AI","AG","AI"),
y0=c(2,2,1,1,2,1),
y3=c(1994,1996,1997,1999,1994,1994),y4=c("AA","FB","AA","EB","AA","EB"),
mw3wuus=c(26,34,22,21,65,78),
Country_true=c("Antigua and Barbuda","Antigua and Barbuda","Anguilla","Anguilla","Antigua and Barbuda","Anguilla"))
DF
y1 y0 y3 y4 mw3wuus Country_true
1 AG 2 1994 AA 26 Antigua and Barbuda
2 AG 2 1996 FB 34 Antigua and Barbuda
3 AI 1 1997 AA 22 Anguilla
4 AI 1 1999 EB 21 Anguilla
5 AG 2 1994 AA 65 Antigua and Barbuda
6 AI 1 1994 EB 78 Anguilla
我正在尝试根据其他列相等的事实创建一个具有均值变量的新列
例如,在示例中,除第 5 行和第 1 行外,所有内容都必须相同,我需要计算 mw3wuus
的平均值,因为它们具有相同的 y1
值, y0
, y3
, 和 y4
.
您可能想尝试一下 aggregate
例如:
aggregate(DF$mw3wuus, FUN=mean,
by=list(y1=DF$y1, y0=DF$y0, y3=DF$y3, y4=DF$y4))
会给你:
y1 y0 y3 y4 x
1 AG 2 1994 AA 45.5
2 AI 1 1997 AA 22.0
3 AI 1 1994 EB 78.0
4 AI 1 1999 EB 21.0
5 AG 2 1996 FB 34.0
使用data.table
library(data.table)
setDT(DF)[, Mean := mean(mw3wuus), by = .(y1, y0, y3, y4)][]
# y1 y0 y3 y4 mw3wuus Country_true Mean
# 1: AG 2 1994 AA 26 Antigua and Barbuda 45.5
# 2: AG 2 1996 FB 34 Antigua and Barbuda 34.0
# 3: AI 1 1997 AA 22 Anguilla 22.0
# 4: AI 1 1999 EB 21 Anguilla 21.0
# 5: AG 2 1994 AA 65 Antigua and Barbuda 45.5
# 6: AI 1 1994 EB 78 Anguilla 78.0
或使用 dplyr
包:
library(dplyr)
DF %>% group_by(y1,y0,y3,y4) %>% summarise (x = mean(mw3wuus))
我有一个数据框
DF <- data.frame(y1=c("AG","AG","AI","AI","AG","AI"),
y0=c(2,2,1,1,2,1),
y3=c(1994,1996,1997,1999,1994,1994),y4=c("AA","FB","AA","EB","AA","EB"),
mw3wuus=c(26,34,22,21,65,78),
Country_true=c("Antigua and Barbuda","Antigua and Barbuda","Anguilla","Anguilla","Antigua and Barbuda","Anguilla"))
DF
y1 y0 y3 y4 mw3wuus Country_true
1 AG 2 1994 AA 26 Antigua and Barbuda
2 AG 2 1996 FB 34 Antigua and Barbuda
3 AI 1 1997 AA 22 Anguilla
4 AI 1 1999 EB 21 Anguilla
5 AG 2 1994 AA 65 Antigua and Barbuda
6 AI 1 1994 EB 78 Anguilla
我正在尝试根据其他列相等的事实创建一个具有均值变量的新列
例如,在示例中,除第 5 行和第 1 行外,所有内容都必须相同,我需要计算 mw3wuus
的平均值,因为它们具有相同的 y1
值, y0
, y3
, 和 y4
.
您可能想尝试一下 aggregate
例如:
aggregate(DF$mw3wuus, FUN=mean,
by=list(y1=DF$y1, y0=DF$y0, y3=DF$y3, y4=DF$y4))
会给你:
y1 y0 y3 y4 x
1 AG 2 1994 AA 45.5
2 AI 1 1997 AA 22.0
3 AI 1 1994 EB 78.0
4 AI 1 1999 EB 21.0
5 AG 2 1996 FB 34.0
使用data.table
library(data.table)
setDT(DF)[, Mean := mean(mw3wuus), by = .(y1, y0, y3, y4)][]
# y1 y0 y3 y4 mw3wuus Country_true Mean
# 1: AG 2 1994 AA 26 Antigua and Barbuda 45.5
# 2: AG 2 1996 FB 34 Antigua and Barbuda 34.0
# 3: AI 1 1997 AA 22 Anguilla 22.0
# 4: AI 1 1999 EB 21 Anguilla 21.0
# 5: AG 2 1994 AA 65 Antigua and Barbuda 45.5
# 6: AI 1 1994 EB 78 Anguilla 78.0
或使用 dplyr
包:
library(dplyr)
DF %>% group_by(y1,y0,y3,y4) %>% summarise (x = mean(mw3wuus))