聚合(df,...)返回 NA?
aggregate(df, ...) returning NAs?
我想通过变量 "id" 和 "var1"
在此数据框上应用聚合函数
df <- structure(list (id = c(1L,1L,1L,1L,2L,2L,2L,2L),
var1 = structure(c(1L,1L,2L,2L,1L,1L,2L,2L),
.Label = c("A", "B"), class = "factor"),
var2 = c(1L,2L,1L,2L,1L,2L,1L,2L),
values = c(37L,20L,22L,18L,30L,5L,41L,50L)),
.Names = c("id","var1","var2","values"),
class = "data.frame", row.names = c(NA,-8L))
# looks like
> df
id var1 var2 values
1 1 A 1 37
2 1 A 2 20
3 1 B 1 22
4 1 B 2 18
5 2 A 1 30
6 2 A 2 5
7 2 B 1 41
8 2 B 2 50
但是,如果我这样做,我会收到很多警告和一列满是 NA 的内容
> agg <- aggregate(df, by=list(df$id, df$var1), mean)
Warning messages:
1: In mean.default(X[[i]], ...) :
argument is not numeric or logical: returning NA
2: In mean.default(X[[i]], ...) :
argument is not numeric or logical: returning NA
3: In mean.default(X[[i]], ...) :
argument is not numeric or logical: returning NA
4: In mean.default(X[[i]], ...) :
argument is not numeric or logical: returning NA
> agg
Group.1 Group.2 id var1 var2 values
1 1 A 1 NA 1.5 28.5
2 2 A 2 NA 1.5 17.5
3 1 B 1 NA 1.5 20.0
4 2 B 2 NA 1.5 45.5
有没有办法避免这些警告?由于这些,我的汇总结果是否丢失了一些数据?
试试这个
aggregate( . ~ id + var1 , data = df, mean)
# id var1 var2 values
#1 1 A 1.5 28.5
#2 2 A 1.5 17.5
#3 1 B 1.5 20.0
#4 2 B 1.5 45.5
这里有一些其他选项
使用dplyr
library(dplyr)
df %>% group_by(id, var1) %>% summarize(var2 = mean(var2), values = mean(values))
#or simply
df %>% group_by(id, var1) %>% summarise_each(funs(mean))
#Source: local data frame [4 x 4]
#Groups: id
# id var1 var2 values
#1 1 A 1.5 28.5
#2 2 A 1.5 17.5
#3 1 B 1.5 20.0
#4 2 B 1.5 45.5
使用data.table
,你有两个选择:
library(data.table)
setDT(df)[, .(var2 = mean(var2), values = mean(values)), by = .(id, var1)] # option 1
setDT(df)[, lapply(.SD, mean), by=.(id,var1), .SDcols=c("var2","values")] # option 2
# id var1 var2 values
#1: 1 A 1.5 28.5
#2: 1 B 1.5 20.0
#3: 2 A 1.5 17.5
#4: 2 B 1.5 45.5
使用ddply
library(plyr)
ddply(df, .(id,var1), colwise(mean))
# id var1 var2 values
#1 1 A 1.5 28.5
#2 1 B 1.5 20.0
#3 2 A 1.5 17.5
#4 2 B 1.5 45.5
您需要将为参数 x
提供的数据框限制为您希望应用 FUN 的列。因此,在您的示例中,您希望将均值函数应用于按 id
和 var1
分组的值列,因此您需要指定 df$values
而不仅仅是 df
:
agg <- aggregate(df$values, by=list(df$id, df$var1), mean)
因为您的第一个参数 (data=df, ...)
要求它聚合所有 df 的列(而不仅仅是单个列 values
)。
你想要(data=df$values,...
.
或者用别人说的公式界面
我想通过变量 "id" 和 "var1"
在此数据框上应用聚合函数df <- structure(list (id = c(1L,1L,1L,1L,2L,2L,2L,2L),
var1 = structure(c(1L,1L,2L,2L,1L,1L,2L,2L),
.Label = c("A", "B"), class = "factor"),
var2 = c(1L,2L,1L,2L,1L,2L,1L,2L),
values = c(37L,20L,22L,18L,30L,5L,41L,50L)),
.Names = c("id","var1","var2","values"),
class = "data.frame", row.names = c(NA,-8L))
# looks like
> df
id var1 var2 values
1 1 A 1 37
2 1 A 2 20
3 1 B 1 22
4 1 B 2 18
5 2 A 1 30
6 2 A 2 5
7 2 B 1 41
8 2 B 2 50
但是,如果我这样做,我会收到很多警告和一列满是 NA 的内容
> agg <- aggregate(df, by=list(df$id, df$var1), mean)
Warning messages:
1: In mean.default(X[[i]], ...) :
argument is not numeric or logical: returning NA
2: In mean.default(X[[i]], ...) :
argument is not numeric or logical: returning NA
3: In mean.default(X[[i]], ...) :
argument is not numeric or logical: returning NA
4: In mean.default(X[[i]], ...) :
argument is not numeric or logical: returning NA
> agg
Group.1 Group.2 id var1 var2 values
1 1 A 1 NA 1.5 28.5
2 2 A 2 NA 1.5 17.5
3 1 B 1 NA 1.5 20.0
4 2 B 2 NA 1.5 45.5
有没有办法避免这些警告?由于这些,我的汇总结果是否丢失了一些数据?
试试这个
aggregate( . ~ id + var1 , data = df, mean)
# id var1 var2 values
#1 1 A 1.5 28.5
#2 2 A 1.5 17.5
#3 1 B 1.5 20.0
#4 2 B 1.5 45.5
这里有一些其他选项
使用dplyr
library(dplyr)
df %>% group_by(id, var1) %>% summarize(var2 = mean(var2), values = mean(values))
#or simply
df %>% group_by(id, var1) %>% summarise_each(funs(mean))
#Source: local data frame [4 x 4]
#Groups: id
# id var1 var2 values
#1 1 A 1.5 28.5
#2 2 A 1.5 17.5
#3 1 B 1.5 20.0
#4 2 B 1.5 45.5
使用data.table
,你有两个选择:
library(data.table)
setDT(df)[, .(var2 = mean(var2), values = mean(values)), by = .(id, var1)] # option 1
setDT(df)[, lapply(.SD, mean), by=.(id,var1), .SDcols=c("var2","values")] # option 2
# id var1 var2 values
#1: 1 A 1.5 28.5
#2: 1 B 1.5 20.0
#3: 2 A 1.5 17.5
#4: 2 B 1.5 45.5
使用ddply
library(plyr)
ddply(df, .(id,var1), colwise(mean))
# id var1 var2 values
#1 1 A 1.5 28.5
#2 1 B 1.5 20.0
#3 2 A 1.5 17.5
#4 2 B 1.5 45.5
您需要将为参数 x
提供的数据框限制为您希望应用 FUN 的列。因此,在您的示例中,您希望将均值函数应用于按 id
和 var1
分组的值列,因此您需要指定 df$values
而不仅仅是 df
:
agg <- aggregate(df$values, by=list(df$id, df$var1), mean)
因为您的第一个参数 (data=df, ...)
要求它聚合所有 df 的列(而不仅仅是单个列 values
)。
你想要(data=df$values,...
.
或者用别人说的公式界面