为多列单独汇总数据 table
Summarize data table individually for multiple columns
我正在尝试尽可能自动汇总多列数据,而不是为每一列单独编写代码。我想总结一下:
Patch Size Achmil Aciarv Aegpod Agrcap
A 10 0 1 1 0
B 2 1 0 0 0
C 2 1 0 0 0
D 2 1 0 0 0
进入这个
Species Presence MaxSize MeanSize Count
Achmil 0 10 10 1
Achmil 1 2 2 3
Aciarv 0 2 2 3
Aciarv 1 10 10 1
我知道我可以单独 运行 group_by 并对每一列进行总结
achmil<-group_by(LimitArea, Achmil) %>%
summarise(SumA=mean(Size))
但是有没有办法使用某种循环自动 运行 每个列的每个存在和不存在?任何帮助表示赞赏。
也许我们需要 gather
为长格式,然后执行 summarise
library(tidyverse)
gather(df1, Species, Presence, Achmil:Agrcap) %>%
group_by(Species, Presence) %>%
summarise( MaxSize = max(Size), MeanSize = mean(Size), Count = n())
# A tibble: 7 x 5
# Groups: Species [?]
# Species Presence MaxSize MeanSize Count
# <chr> <int> <dbl> <dbl> <int>
#1 Achmil 0 10.0 10.0 1
#2 Achmil 1 2.00 2.00 3
#3 Aciarv 0 2.00 2.00 3
#4 Aciarv 1 10.0 10.0 1
#5 Aegpod 0 2.00 2.00 3
#6 Aegpod 1 10.0 10.0 1
#7 Agrcap 0 10.0 4.00 4
在较新版本的dplyr/tidyr
中,我们可以使用pivot_longer
df1 %>%
pivot_longer(cols = Achmil:Agrcap, names_to = "Species",
values_to = "Presence") %>%
group_by(Species, Presence) %>%
summarise(MaxSize = max(Size), MeanSize = mean(Size), Count = n())
这里是另一个使用聚合的解决方案(和 reshape2::melt()
)
library(reshape2)
df = melt(df[,2:ncol(df)], "Size")
aggregate(. ~ `variable`+`value`, data = df,
FUN = function(x) c(max = max(x), mean = mean(x), count = length(x)))
variable value Size.max Size.mean Size.count
1 Achmil 0 10 10 1
2 Aciarv 0 2 2 3
3 Aegpod 0 2 2 3
4 Agrcap 0 10 4 4
5 Achmil 1 2 2 3
6 Aciarv 1 10 10 1
7 Aegpod 1 10 10 1
我正在尝试尽可能自动汇总多列数据,而不是为每一列单独编写代码。我想总结一下:
Patch Size Achmil Aciarv Aegpod Agrcap
A 10 0 1 1 0
B 2 1 0 0 0
C 2 1 0 0 0
D 2 1 0 0 0
进入这个
Species Presence MaxSize MeanSize Count
Achmil 0 10 10 1
Achmil 1 2 2 3
Aciarv 0 2 2 3
Aciarv 1 10 10 1
我知道我可以单独 运行 group_by 并对每一列进行总结
achmil<-group_by(LimitArea, Achmil) %>%
summarise(SumA=mean(Size))
但是有没有办法使用某种循环自动 运行 每个列的每个存在和不存在?任何帮助表示赞赏。
也许我们需要 gather
为长格式,然后执行 summarise
library(tidyverse)
gather(df1, Species, Presence, Achmil:Agrcap) %>%
group_by(Species, Presence) %>%
summarise( MaxSize = max(Size), MeanSize = mean(Size), Count = n())
# A tibble: 7 x 5
# Groups: Species [?]
# Species Presence MaxSize MeanSize Count
# <chr> <int> <dbl> <dbl> <int>
#1 Achmil 0 10.0 10.0 1
#2 Achmil 1 2.00 2.00 3
#3 Aciarv 0 2.00 2.00 3
#4 Aciarv 1 10.0 10.0 1
#5 Aegpod 0 2.00 2.00 3
#6 Aegpod 1 10.0 10.0 1
#7 Agrcap 0 10.0 4.00 4
在较新版本的dplyr/tidyr
中,我们可以使用pivot_longer
df1 %>%
pivot_longer(cols = Achmil:Agrcap, names_to = "Species",
values_to = "Presence") %>%
group_by(Species, Presence) %>%
summarise(MaxSize = max(Size), MeanSize = mean(Size), Count = n())
这里是另一个使用聚合的解决方案(和 reshape2::melt()
)
library(reshape2)
df = melt(df[,2:ncol(df)], "Size")
aggregate(. ~ `variable`+`value`, data = df,
FUN = function(x) c(max = max(x), mean = mean(x), count = length(x)))
variable value Size.max Size.mean Size.count
1 Achmil 0 10 10 1
2 Aciarv 0 2 2 3
3 Aegpod 0 2 2 3
4 Agrcap 0 10 4 4
5 Achmil 1 2 2 3
6 Aciarv 1 10 10 1
7 Aegpod 1 10 10 1