聚合 na.action=na.pass 给出意想不到的答案

Question

我以下面的data.frame为例：

d <- data.frame(x=c(1,NA), y=c(2,3))

我想用变量 x 来总结 y 的值。由于没有 x 的共同值，我希望聚合只会给我原来的 data.frame 返回，其中 NA 被视为一个组。但是聚合给了我以下结果。

>aggregate(y ~ x, data=d, FUN=sum)
  x y
1 1 2

我已经阅读了关于更改 na.action 的默认操作的文档，但它似乎没有给我任何意义。

>aggregate(y ~ x, data=d, FUN=sum, na.action=na.pass)
  x y
1 1 2

这是怎么回事？我似乎不明白 na.pass 在这种情况下在做什么。是否可以选择在 R 中完成我想要的？任何帮助将不胜感激。

Answer 1

aggregate 使用 tapply，后者又在其分组变量上使用 factor。

但是，看看 factor 中的 NA 值会发生什么：

factor(c(1, 2, NA))
# [1] 1    2    <NA>
# Levels: 1 2

注意 levels。您可以使用 addNA 来保留 NA:

addNA(factor(c(1, 2, NA)))
# [1] 1    2    <NA>
# Levels: 1 2 <NA>

因此，您可能需要执行以下操作：

aggregate(y ~ addNA(x), d, sum)
#   addNA(x) y
# 1        1 2
# 2     <NA> 3

或类似的东西：

d$x <- addNA(factor(d$x))
str(d)
# 'data.frame': 2 obs. of  2 variables:
#  $ x: Factor w/ 2 levels "1",NA: 1 2
#  $ y: num  2 3
aggregate(y ~ x, d, sum)
#      x y
# 1    1 2
# 2 <NA> 3

（或者，升级到 "data.table" 之类的东西，它不仅会比 aggregate 更快，而且还会使您的行为与 NA 值更加一致。不用在意自己是不是using the formula method of aggregate or not.)

library(data.table)
as.data.table(d)[, sum(y), by = x]
#     x V1
# 1:  1  2
# 2: NA  3

聚合 na.action=na.pass 给出意想不到的答案

Aggregate with na.action=na.pass gives unexpected answer

aggregate

r

na