在矩阵中按组(行名)对列求和
Sum columns by group (row names) in a matrix
假设我有一个名为 x
的矩阵。
x <- structure(c(1, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1),
.Dim = c(5L, 4L), .Dimnames = list(c("Cake", "Pie", "Cake", "Pie", "Pie"),
c("Mon", "Tue", "Wed", "Thurs")))
x
Mon Tue Wed Thurs
Cake 1 0 1 1
Pie 0 0 1 1
Cake 1 1 0 1
Pie 0 0 1 1
Pie 0 0 1 1
我想对按行名分组的每一列求和:
Mon Tue Wed Thurs
Cake 2 1 1 2
Pie 0 0 3 3
我试过使用 addmargins(x)
,但这只能给出每列和每行的总和。有什么建议么?我搜索了其他问题,但无法弄清楚。
你可以试试这个
df <- read.table(head=TRUE, text="
Name Mon Tue Wed Thurs
Cake 1 0 1 1
Pie 0 0 1 1
Cake 1 1 0 1
Pie 0 0 1 1
Pie 0 0 1 1")
aggregate(. ~ Name, data=df, FUN=sum)
## Name Mon Tue Wed Thurs
## 1 Cake 2 1 1 2
## 2 Pie 0 0 3 3
还有 dplyr
library(dplyr)
group_by(df, Name) %>%
summarise(Mon = sum(Mon), Tue = sum(Tue), Wed = sum(Wed), Thurs = sum(Thurs))
或更好
group_by(df, Name) %>%
summarise_each(funs(sum))
使用plyr
的方法:
ldply(split(df, df$Name), function(u) colSums(u[-1]))
# .id Mon Tue Wed Thurs
#1 Cake 2 1 1 2
#2 Pie 0 0 3 3
数据:
df = structure(list(Name = structure(c(1L, 2L, 1L, 2L, 2L), .Label = c("Cake",
"Pie"), class = "factor"), Mon = c(1L, 0L, 1L, 0L, 0L), Tue = c(0L,
0L, 1L, 0L, 0L), Wed = c(1L, 1L, 0L, 1L, 1L), Thurs = c(1L, 1L,
1L, 1L, 1L)), .Names = c("Name", "Mon", "Tue", "Wed", "Thurs"
), row.names = c(NA, -5L), class = "data.frame")
这是一个向量化的基础解决方案
rowsum(df, row.names(x))
# Mon Tue Wed Thurs
# Cake 2 1 1 2
# Pie 0 0 3 3
或使用 keep.rownames = TRUE
的 data.table
版本,以便将您的行名称转换为列名称
library(data.table)
as.data.table(x, keep.rownames = TRUE)[, lapply(.SD, sum), by = rn]
# rn Mon Tue Wed Thurs
# 1: Cake 2 1 1 2
# 2: Pie 0 0 3 3
假设我有一个名为 x
的矩阵。
x <- structure(c(1, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1),
.Dim = c(5L, 4L), .Dimnames = list(c("Cake", "Pie", "Cake", "Pie", "Pie"),
c("Mon", "Tue", "Wed", "Thurs")))
x
Mon Tue Wed Thurs
Cake 1 0 1 1
Pie 0 0 1 1
Cake 1 1 0 1
Pie 0 0 1 1
Pie 0 0 1 1
我想对按行名分组的每一列求和:
Mon Tue Wed Thurs
Cake 2 1 1 2
Pie 0 0 3 3
我试过使用 addmargins(x)
,但这只能给出每列和每行的总和。有什么建议么?我搜索了其他问题,但无法弄清楚。
你可以试试这个
df <- read.table(head=TRUE, text="
Name Mon Tue Wed Thurs
Cake 1 0 1 1
Pie 0 0 1 1
Cake 1 1 0 1
Pie 0 0 1 1
Pie 0 0 1 1")
aggregate(. ~ Name, data=df, FUN=sum)
## Name Mon Tue Wed Thurs
## 1 Cake 2 1 1 2
## 2 Pie 0 0 3 3
还有 dplyr
library(dplyr)
group_by(df, Name) %>%
summarise(Mon = sum(Mon), Tue = sum(Tue), Wed = sum(Wed), Thurs = sum(Thurs))
或更好
group_by(df, Name) %>%
summarise_each(funs(sum))
使用plyr
的方法:
ldply(split(df, df$Name), function(u) colSums(u[-1]))
# .id Mon Tue Wed Thurs
#1 Cake 2 1 1 2
#2 Pie 0 0 3 3
数据:
df = structure(list(Name = structure(c(1L, 2L, 1L, 2L, 2L), .Label = c("Cake",
"Pie"), class = "factor"), Mon = c(1L, 0L, 1L, 0L, 0L), Tue = c(0L,
0L, 1L, 0L, 0L), Wed = c(1L, 1L, 0L, 1L, 1L), Thurs = c(1L, 1L,
1L, 1L, 1L)), .Names = c("Name", "Mon", "Tue", "Wed", "Thurs"
), row.names = c(NA, -5L), class = "data.frame")
这是一个向量化的基础解决方案
rowsum(df, row.names(x))
# Mon Tue Wed Thurs
# Cake 2 1 1 2
# Pie 0 0 3 3
或使用 keep.rownames = TRUE
的 data.table
版本,以便将您的行名称转换为列名称
library(data.table)
as.data.table(x, keep.rownames = TRUE)[, lapply(.SD, sum), by = rn]
# rn Mon Tue Wed Thurs
# 1: Cake 2 1 1 2
# 2: Pie 0 0 3 3