使用聚合函数生成数据框后重命名列时出错
Error with renaming of columns after producing data frame with aggregate function
我不明白为什么无法正确重命名聚合函数生成的数据框的列名?
我使用聚合函数按组进行一些汇总统计。在下文中,我提供了一个可重现的示例。
> data <- read.table(header=T, text='
+ subject sex condition before after change
+ 1 F placebo 10.1 6.9 -3.2
+ 2 F placebo 6.3 4.2 -2.1
+ 3 M aspirin 12.4 6.3 -6.1
+ 4 F placebo 8.1 6.1 -2.0
+ 5 M aspirin 15.2 9.9 -5.3
+ 6 F aspirin 10.9 7.0 -3.9
+ 7 F aspirin 11.6 8.5 -3.1
+ 8 M aspirin 9.5 3.0 -6.5
+ 9 F placebo 11.5 9.0 -2.5
+ 10 M placebo 11.9 11.0 -0.9
+ 11 F aspirin 11.4 8.0 -3.4
+ 12 M aspirin 10.0 4.4 -5.6
+ 13 M aspirin 12.5 5.4 -7.1
+ 14 M placebo 10.6 10.6 0.0
+ 15 M aspirin 9.1 4.3 -4.8
+ 16 F placebo 12.1 10.2 -1.9
+ 17 F placebo 11.0 8.8 -2.2
+ 18 F placebo 11.9 10.2 -1.7
+ 19 M aspirin 9.1 3.6 -5.5
+ 20 M placebo 13.5 12.4 -1.1
+ 21 M aspirin 12.0 7.5 -4.5
+ 22 F placebo 9.1 7.6 -1.5
+ 23 M placebo 9.9 8.0 -1.9
+ 24 F placebo 7.6 5.2 -2.4
+ 25 F placebo 11.8 9.7 -2.1
+ 26 F placebo 11.8 10.7 -1.1
+ 27 F aspirin 10.1 7.9 -2.2
+ 28 M aspirin 11.6 8.3 -3.3
+ 29 F aspirin 11.3 6.8 -4.5
+ 30 F placebo 10.3 8.3 -2.0
+ ')
>
> summary.function <- function(x){c(mean(abs(x)),mean(x),min(x),max(x))}
> data.summary <- aggregate(data=data,change~condition+sex,FUN=summary.function)
> data.summary
condition sex change.1 change.2 change.3 change.4
1 aspirin F 3.420000 -3.420000 -4.500000 -2.200000
2 placebo F 2.058333 -2.058333 -3.200000 -1.100000
3 aspirin M 5.411111 -5.411111 -7.100000 -3.300000
4 placebo M 0.975000 -0.975000 -1.900000 0.000000
> colnames(data.summary) <- c("condition","sex","absmean","mean","min","max")
Error in `colnames<-`(`*tmp*`, value = c("condition", "sex", "absmean", :
'names' attribute [6] must be the same length as the vector [3]
colnames()
函数获取错误的列名:
> colnames(data.summary)
[1] "condition" "sex" "change"
有人可以帮助我吗?
编辑:
在尝试了 r 基础以外的其他包之后,它也适用于
library(doBy)
data.summary <- summaryBy(change ~ sex + condition, data=data, FUN=summary.function) colnames(data.summary) <- c("condition","sex","absmean","mean","min","max")
这可以使用 data.table
来完成。我们将 'data.frame' 转换为 'data.table' (setDT(data)
),按 'condition'、'sex' 列分组,我们将 summary.function
应用于 'change' 并转换为 list
。优点是在输出中,我们得到 6 列而不是 aggregate
中的 matrix
输出(正如@PierreLafortune 在评论中提到的那样),即 data.frame 有 2 个常规列和一个matrix
列(我们可以使用 do.call(data.frame, ..
将其转换为常规 data.frame
)。此外,我们可以直接在 summary.function
中命名函数(我稍微更改了 summary.function
)。即使我们使用修改后的 summary.function
,aggregate
列的列名也会有 change.
前缀,我们稍后可能需要更改它。
library(data.table)
setDT(data)[, as.list(summary.function(change)) , by = .(condition, sex)]
# condition sex absmean mean min max
#1: placebo F 2.058333 -2.058333 -3.2 -1.1
#2: aspirin M 5.411111 -5.411111 -7.1 -3.3
#3: aspirin F 3.420000 -3.420000 -4.5 -2.2
#4: placebo M 0.975000 -0.975000 -1.9 0.0
哪里
summary.function <- function(x){c(absmean=mean(abs(x)),mean=mean(x),
min=min(x),max=max(x))}
这是 aggregate
输出中的问题。
data.summary <- aggregate(data=data, change~condition+sex,FUN=summary.function)
str(data.summary)
#'data.frame': 4 obs. of 3 variables:
# $ condition: Factor w/ 2 levels "aspirin","placebo": 1 2 1 2
# $ sex : Factor w/ 2 levels "F","M": 1 1 2 2
# $ change : num [1:4, 1:4] 3.42 2.058 5.411 0.975 -3.42 ...
# ..- attr(*, "dimnames")=List of 2
# .. ..$ : NULL
# .. ..$ : chr "absmean" "mean" "min" "max"
在这里,我们只有 3 列,第 3 列 'change' 是 matrix
。我们可以转换为常规 data.frame
data.summary <- do.call(data.frame, data.summary)
str(data.summary)
#'data.frame': 4 obs. of 6 variables:
#$ condition : Factor w/ 2 levels "aspirin","placebo": 1 2 1 2
#$ sex : Factor w/ 2 levels "F","M": 1 1 2 2
#$ change.absmean: num 3.42 2.058 5.411 0.975
#$ change.mean : num -3.42 -2.058 -5.411 -0.975
#$ change.min : num -4.5 -3.2 -7.1 -1.9
#$ change.max : num -2.2 -1.1 -3.3 0
通过删除前缀部分更改列名
names(data.summary) <- sub('[^.]+\.', '', names(data.summary))
我不明白为什么无法正确重命名聚合函数生成的数据框的列名?
我使用聚合函数按组进行一些汇总统计。在下文中,我提供了一个可重现的示例。
> data <- read.table(header=T, text='
+ subject sex condition before after change
+ 1 F placebo 10.1 6.9 -3.2
+ 2 F placebo 6.3 4.2 -2.1
+ 3 M aspirin 12.4 6.3 -6.1
+ 4 F placebo 8.1 6.1 -2.0
+ 5 M aspirin 15.2 9.9 -5.3
+ 6 F aspirin 10.9 7.0 -3.9
+ 7 F aspirin 11.6 8.5 -3.1
+ 8 M aspirin 9.5 3.0 -6.5
+ 9 F placebo 11.5 9.0 -2.5
+ 10 M placebo 11.9 11.0 -0.9
+ 11 F aspirin 11.4 8.0 -3.4
+ 12 M aspirin 10.0 4.4 -5.6
+ 13 M aspirin 12.5 5.4 -7.1
+ 14 M placebo 10.6 10.6 0.0
+ 15 M aspirin 9.1 4.3 -4.8
+ 16 F placebo 12.1 10.2 -1.9
+ 17 F placebo 11.0 8.8 -2.2
+ 18 F placebo 11.9 10.2 -1.7
+ 19 M aspirin 9.1 3.6 -5.5
+ 20 M placebo 13.5 12.4 -1.1
+ 21 M aspirin 12.0 7.5 -4.5
+ 22 F placebo 9.1 7.6 -1.5
+ 23 M placebo 9.9 8.0 -1.9
+ 24 F placebo 7.6 5.2 -2.4
+ 25 F placebo 11.8 9.7 -2.1
+ 26 F placebo 11.8 10.7 -1.1
+ 27 F aspirin 10.1 7.9 -2.2
+ 28 M aspirin 11.6 8.3 -3.3
+ 29 F aspirin 11.3 6.8 -4.5
+ 30 F placebo 10.3 8.3 -2.0
+ ')
>
> summary.function <- function(x){c(mean(abs(x)),mean(x),min(x),max(x))}
> data.summary <- aggregate(data=data,change~condition+sex,FUN=summary.function)
> data.summary
condition sex change.1 change.2 change.3 change.4
1 aspirin F 3.420000 -3.420000 -4.500000 -2.200000
2 placebo F 2.058333 -2.058333 -3.200000 -1.100000
3 aspirin M 5.411111 -5.411111 -7.100000 -3.300000
4 placebo M 0.975000 -0.975000 -1.900000 0.000000
> colnames(data.summary) <- c("condition","sex","absmean","mean","min","max")
Error in `colnames<-`(`*tmp*`, value = c("condition", "sex", "absmean", : 'names' attribute [6] must be the same length as the vector [3]
colnames()
函数获取错误的列名:
> colnames(data.summary)
[1] "condition" "sex" "change"
有人可以帮助我吗?
编辑: 在尝试了 r 基础以外的其他包之后,它也适用于
library(doBy)
data.summary <- summaryBy(change ~ sex + condition, data=data, FUN=summary.function) colnames(data.summary) <- c("condition","sex","absmean","mean","min","max")
这可以使用 data.table
来完成。我们将 'data.frame' 转换为 'data.table' (setDT(data)
),按 'condition'、'sex' 列分组,我们将 summary.function
应用于 'change' 并转换为 list
。优点是在输出中,我们得到 6 列而不是 aggregate
中的 matrix
输出(正如@PierreLafortune 在评论中提到的那样),即 data.frame 有 2 个常规列和一个matrix
列(我们可以使用 do.call(data.frame, ..
将其转换为常规 data.frame
)。此外,我们可以直接在 summary.function
中命名函数(我稍微更改了 summary.function
)。即使我们使用修改后的 summary.function
,aggregate
列的列名也会有 change.
前缀,我们稍后可能需要更改它。
library(data.table)
setDT(data)[, as.list(summary.function(change)) , by = .(condition, sex)]
# condition sex absmean mean min max
#1: placebo F 2.058333 -2.058333 -3.2 -1.1
#2: aspirin M 5.411111 -5.411111 -7.1 -3.3
#3: aspirin F 3.420000 -3.420000 -4.5 -2.2
#4: placebo M 0.975000 -0.975000 -1.9 0.0
哪里
summary.function <- function(x){c(absmean=mean(abs(x)),mean=mean(x),
min=min(x),max=max(x))}
这是 aggregate
输出中的问题。
data.summary <- aggregate(data=data, change~condition+sex,FUN=summary.function)
str(data.summary)
#'data.frame': 4 obs. of 3 variables:
# $ condition: Factor w/ 2 levels "aspirin","placebo": 1 2 1 2
# $ sex : Factor w/ 2 levels "F","M": 1 1 2 2
# $ change : num [1:4, 1:4] 3.42 2.058 5.411 0.975 -3.42 ...
# ..- attr(*, "dimnames")=List of 2
# .. ..$ : NULL
# .. ..$ : chr "absmean" "mean" "min" "max"
在这里,我们只有 3 列,第 3 列 'change' 是 matrix
。我们可以转换为常规 data.frame
data.summary <- do.call(data.frame, data.summary)
str(data.summary)
#'data.frame': 4 obs. of 6 variables:
#$ condition : Factor w/ 2 levels "aspirin","placebo": 1 2 1 2
#$ sex : Factor w/ 2 levels "F","M": 1 1 2 2
#$ change.absmean: num 3.42 2.058 5.411 0.975
#$ change.mean : num -3.42 -2.058 -5.411 -0.975
#$ change.min : num -4.5 -3.2 -7.1 -1.9
#$ change.max : num -2.2 -1.1 -3.3 0
通过删除前缀部分更改列名
names(data.summary) <- sub('[^.]+\.', '', names(data.summary))