在 R 中按组合并数据
Merging data by group in R
我构建了以下 data.frame 对象:
name <- c("Homer", "Marge", "Bart", "Lisa", "Maggie")
incidents <- c(133, 36, 1242, 2, NA)
gender <- c("MALE", "FEMALE", "MALE", "FEMALE", "FEMALE")
data <- data.frame(name, incidents, gender)
产生数据=
name incidents gender
1 Homer 133 MALE
2 Marge 36 FEMALE
3 Bart 1242 MALE
4 Lisa 2 FEMALE
5 Maggie NA FEMALE
首先我用
清理数据
clean_data <- data[!is.na(incidents), ]
这样 clean_data =
name incidents gender
1 Homer 133 MALE
2 Marge 36 FEMALE
3 Bart 1242 MALE
4 Lisa 2 FEMALE
现在我按性别汇总
agg <- aggregate(incidents ~ gender, clean_data, mean)
屈服
gender incidents
1 FEMALE 19.0
2 MALE 687.5
现在,我希望能够 "fill in" 事件中的 NA 值与来自 agg 的数据使得 data =
name incidents gender
1 Homer 133 MALE
2 Marge 36 FEMALE
3 Bart 1242 MALE
4 Lisa 2 FEMALE
5 Maggie 19.0 FEMALE
使用 base R 执行此操作的最简单方法是什么?
您可以使用 ave
。它以与原始数据集中相同的顺序 ("vals") 给出 "mean" 值,检查 "incidents" 列中的 "NA" 元素并将其替换为 "vals"对应的"NA"元素。
vals <- with(data, ave(incidents, gender, FUN= function(x)
mean(x, na.rm=TRUE)))
indx1 <- is.na(data$incidents)
data$incidents[indx1] <- vals[indx1]
@MrFlick 在评论中显示的较短版本。使用 "ifelse",它将 "NA" 元素替换为 "mean" 值。
data$incidents<-with(data, ave(incidents, gender,
FUN=function(x) ifelse(is.na(x), mean(x, na.rm=T), x)))
而不是 "ifelse","replace" 也可以用作 @Ananda Mahto 显示的 "data.table"。
为了多样化,这里有一个使用 "data.table" 的方法,它也演示了 replace
函数。
library(data.table)
as.data.table(data)[
, incidents := replace(incidents, is.na(incidents),
mean(incidents, na.rm = TRUE)),
by = gender][]
# name incidents gender
# 1: Homer 133 MALE
# 2: Marge 36 FEMALE
# 3: Bart 1242 MALE
# 4: Lisa 2 FEMALE
# 5: Maggie 19 FEMALE
我构建了以下 data.frame 对象:
name <- c("Homer", "Marge", "Bart", "Lisa", "Maggie")
incidents <- c(133, 36, 1242, 2, NA)
gender <- c("MALE", "FEMALE", "MALE", "FEMALE", "FEMALE")
data <- data.frame(name, incidents, gender)
产生数据=
name incidents gender
1 Homer 133 MALE
2 Marge 36 FEMALE
3 Bart 1242 MALE
4 Lisa 2 FEMALE
5 Maggie NA FEMALE
首先我用
清理数据clean_data <- data[!is.na(incidents), ]
这样 clean_data =
name incidents gender
1 Homer 133 MALE
2 Marge 36 FEMALE
3 Bart 1242 MALE
4 Lisa 2 FEMALE
现在我按性别汇总
agg <- aggregate(incidents ~ gender, clean_data, mean)
屈服
gender incidents
1 FEMALE 19.0
2 MALE 687.5
现在,我希望能够 "fill in" 事件中的 NA 值与来自 agg 的数据使得 data =
name incidents gender
1 Homer 133 MALE
2 Marge 36 FEMALE
3 Bart 1242 MALE
4 Lisa 2 FEMALE
5 Maggie 19.0 FEMALE
使用 base R 执行此操作的最简单方法是什么?
您可以使用 ave
。它以与原始数据集中相同的顺序 ("vals") 给出 "mean" 值,检查 "incidents" 列中的 "NA" 元素并将其替换为 "vals"对应的"NA"元素。
vals <- with(data, ave(incidents, gender, FUN= function(x)
mean(x, na.rm=TRUE)))
indx1 <- is.na(data$incidents)
data$incidents[indx1] <- vals[indx1]
@MrFlick 在评论中显示的较短版本。使用 "ifelse",它将 "NA" 元素替换为 "mean" 值。
data$incidents<-with(data, ave(incidents, gender,
FUN=function(x) ifelse(is.na(x), mean(x, na.rm=T), x)))
而不是 "ifelse","replace" 也可以用作 @Ananda Mahto 显示的 "data.table"。
为了多样化,这里有一个使用 "data.table" 的方法,它也演示了 replace
函数。
library(data.table)
as.data.table(data)[
, incidents := replace(incidents, is.na(incidents),
mean(incidents, na.rm = TRUE)),
by = gender][]
# name incidents gender
# 1: Homer 133 MALE
# 2: Marge 36 FEMALE
# 3: Bart 1242 MALE
# 4: Lisa 2 FEMALE
# 5: Maggie 19 FEMALE