在 R 中按组合并数据

Merging data by group in R

我构建了以下 data.frame 对象:

name <- c("Homer", "Marge", "Bart", "Lisa", "Maggie")
incidents <- c(133, 36, 1242, 2, NA)
gender <- c("MALE", "FEMALE", "MALE", "FEMALE", "FEMALE")
data <- data.frame(name, incidents, gender)

产生数据=

    name incidents gender
1  Homer       133   MALE
2  Marge        36 FEMALE
3   Bart      1242   MALE
4   Lisa         2 FEMALE
5 Maggie        NA FEMALE

首先我用

清理数据
clean_data <- data[!is.na(incidents), ]

这样 clean_data =

   name incidents gender
1 Homer       133   MALE
2 Marge        36 FEMALE
3  Bart      1242   MALE
4  Lisa         2 FEMALE

现在我按性别汇总

agg <- aggregate(incidents ~ gender, clean_data, mean)

屈服

  gender incidents
1 FEMALE      19.0
2   MALE     687.5

现在,我希望能够 "fill in" 事件中的 NA 值与来自 agg 的数据使得 data =

    name incidents gender
1  Homer       133   MALE
2  Marge        36 FEMALE
3   Bart      1242   MALE
4   Lisa         2 FEMALE
5 Maggie      19.0 FEMALE

使用 base R 执行此操作的最简单方法是什么?

您可以使用 ave。它以与原始数据集中相同的顺序 ("vals") 给出 "mean" 值,检查 "incidents" 列中的 "NA" 元素并将其替换为 "vals"对应的"NA"元素。

 vals <- with(data, ave(incidents, gender, FUN= function(x)
                                         mean(x, na.rm=TRUE)))
 indx1 <- is.na(data$incidents)
 data$incidents[indx1] <- vals[indx1]

@MrFlick 在评论中显示的较短版本。使用 "ifelse",它将 "NA" 元素替换为 "mean" 值。

 data$incidents<-with(data, ave(incidents, gender,
          FUN=function(x) ifelse(is.na(x), mean(x, na.rm=T), x)))

而不是 "ifelse","replace" 也可以用作 @Ananda Mahto 显示的 "data.table"。

为了多样化,这里有一个使用 "data.table" 的方法,它也演示了 replace 函数。

library(data.table)
as.data.table(data)[
  , incidents := replace(incidents, is.na(incidents), 
                         mean(incidents, na.rm = TRUE)), 
  by = gender][]
#      name incidents gender
# 1:  Homer       133   MALE
# 2:  Marge        36 FEMALE
# 3:   Bart      1242   MALE
# 4:   Lisa         2 FEMALE
# 5: Maggie        19 FEMALE