如何根据多个其他列 groupby 重复汇总一列
How to repeat summarise a column based on multiple other column groupby s
假设我想根据 B-D 列中的不同值计算 A 列的平均值(或自定义函数)。这是数据:
input:
data <- data.frame(A = round(runif(20,min = 0,max = 10),0),
B = round(runif(20,min = 0,max = 1),0),
C = round(runif(20,min = 0,max = 1),0),
D = round(runif(20,min = 0,max = 1),0))
output (note your rand numbers might result in different summary table):
col value mean
B 0 5.92
B 1 4.71
C 0 6
C 1 5.17
D 0 4.89
D 1 6
我可以分别为每一列做:
data %>% group_by(B) %>% summarise(mean(A))
我把它放在for loop
:
p <- data.frame(NULL)
for(i in c('B','C','D')){
q <- data %>% group_by_(i) %>% summarise(col=i,mean = mean(A))
p <- append(p,q)
}
但它并没有像预期的那样工作。任何建议都会很有帮助。
一个选项是 gather
将数据转换为 'long' 格式,按 'key'、'val' 列分组,得到 mean
的 'A'
library(tidyverse)
gather(data, key, val, B:D) %>%
group_by(key, val) %>%
summarise(A = mean(A))
或在 base R
中,通过 unlist
将列从 'B' 合并到 'D' 并将分组列用作 'A' 和复制的列名称
aggregate(A ~ ., cbind(data['A'], cN = names(data)[-1][col(data[-1])],
group = unlist(data[-1])), mean)
数据
set.seed(24)
data <- data.frame(A = round(runif(20,min = 0,max = 10),0),
B = round(runif(20,min = 0,max = 1),0),
C = round(runif(20,min = 0,max = 1),0),
D = round(runif(20,min = 0,max = 1),0))
另一种选择,使用 base 和 reshape 包,将是:
data <- data.frame(A = round(runif(20,min = 0,max = 10),0),
B = round(runif(20,min = 0,max = 1),0),
C = round(runif(20,min = 0,max = 1),0),
D = round(runif(20,min = 0,max = 1),0))
melt(t(apply(data[,-1],2,function(x) by(data[,1],x,mean))))
Var1 Var2 value
1 B 0 4.100000
2 C 0 3.727273
3 D 0 4.250000
4 B 1 4.800000
5 C 1 5.333333
6 D 1 4.583333
melt 和 t 函数只是为了得到你想要的形状的输出
假设我想根据 B-D 列中的不同值计算 A 列的平均值(或自定义函数)。这是数据:
input:
data <- data.frame(A = round(runif(20,min = 0,max = 10),0),
B = round(runif(20,min = 0,max = 1),0),
C = round(runif(20,min = 0,max = 1),0),
D = round(runif(20,min = 0,max = 1),0))
output (note your rand numbers might result in different summary table):
col value mean
B 0 5.92
B 1 4.71
C 0 6
C 1 5.17
D 0 4.89
D 1 6
我可以分别为每一列做:
data %>% group_by(B) %>% summarise(mean(A))
我把它放在for loop
:
p <- data.frame(NULL)
for(i in c('B','C','D')){
q <- data %>% group_by_(i) %>% summarise(col=i,mean = mean(A))
p <- append(p,q)
}
但它并没有像预期的那样工作。任何建议都会很有帮助。
一个选项是 gather
将数据转换为 'long' 格式,按 'key'、'val' 列分组,得到 mean
的 'A'
library(tidyverse)
gather(data, key, val, B:D) %>%
group_by(key, val) %>%
summarise(A = mean(A))
或在 base R
中,通过 unlist
将列从 'B' 合并到 'D' 并将分组列用作 'A' 和复制的列名称
aggregate(A ~ ., cbind(data['A'], cN = names(data)[-1][col(data[-1])],
group = unlist(data[-1])), mean)
数据
set.seed(24)
data <- data.frame(A = round(runif(20,min = 0,max = 10),0),
B = round(runif(20,min = 0,max = 1),0),
C = round(runif(20,min = 0,max = 1),0),
D = round(runif(20,min = 0,max = 1),0))
另一种选择,使用 base 和 reshape 包,将是:
data <- data.frame(A = round(runif(20,min = 0,max = 10),0),
B = round(runif(20,min = 0,max = 1),0),
C = round(runif(20,min = 0,max = 1),0),
D = round(runif(20,min = 0,max = 1),0))
melt(t(apply(data[,-1],2,function(x) by(data[,1],x,mean))))
Var1 Var2 value
1 B 0 4.100000
2 C 0 3.727273
3 D 0 4.250000
4 B 1 4.800000
5 C 1 5.333333
6 D 1 4.583333
melt 和 t 函数只是为了得到你想要的形状的输出