如何按行数聚合
How to aggregate by the number of rows
目的是按行数聚合观察结果。
为了说明,示例数据如下所示:
structure(list(observation = c(1, 0, 0, 0, 0, 0, 1, 0, 0, 1,
0, 0, 0, 0, 0, 0, 0, 0, 0, 1)), class = "data.frame", row.names = c(NA,
-20L), variable.labels = structure(character(0), .Names = character(0)), codepage = 65001L)
视觉上,上面是:
╔═════════════╗
║ observation ║
╠═════════════╣
║ 1 ║
╠═════════════╣
║ 0 ║
╠═════════════╣
║ 0 ║
╠═════════════╣
║ 0 ║
╠═════════════╣
║ 0 ║
╠═════════════╣
║ 0 ║
╠═════════════╣
║ 1 ║
╠═════════════╣
║ 0 ║
╠═════════════╣
║ 0 ║
╠═════════════╣
║ 1 ║
╠═════════════╣
║ 0 ║
╠═════════════╣
║ 0 ║
╠═════════════╣
║ 0 ║
╠═════════════╣
║ 0 ║
╠═════════════╣
║ 0 ║
╠═════════════╣
║ 0 ║
╠═════════════╣
║ 0 ║
╠═════════════╣
║ 0 ║
╠═════════════╣
║ 0 ║
╠═════════════╣
║ 1 ║
╚═════════════╝
最终目标是根据 1 的计数和平均值按指定的行数(例如,下面示例输出中的 10 行)进行聚合。输出看起来像:
╔═══════╦══════╗
║ count ║ mean ║
╠═══════╬══════╣
║ 3 ║ 0.3 ║
╠═══════╬══════╣
║ 1 ║ 0.1 ║
╚═══════╩══════╝
您可以试试下面的代码
do.call(
rbind,
tapply(
df$observation,
ceiling(seq(nrow(df)) / 10),
function(x) data.frame(count = sum(x), mean = mean(x))
)
)
这给出了
count mean
1 3 0.3
2 1 0.1
一个tidyverse
的解决方案。根据 row_number
:
的 mod 10 创建分组变量
library(tidyverse)
d %>%
mutate(rn = cumsum(row_number() %% 10 == 1)) %>%
group_by(rn) %>%
summarise(count = sum(observation),
mean = mean(observation))
rn count mean
<int> <dbl> <dbl>
1 1 3 0.3
2 2 1 0.1
使用data.table
library(data.table)
setDT(df1)[, .(count = sum(observation), mean = mean(observation)),
.(grp = as.integer(gl(nrow(df1), 10, nrow(df1))))][, grp := NULL][]
-输出
# count mean
#1: 3 0.3
#2: 1 0.1
目的是按行数聚合观察结果。
为了说明,示例数据如下所示:
structure(list(observation = c(1, 0, 0, 0, 0, 0, 1, 0, 0, 1,
0, 0, 0, 0, 0, 0, 0, 0, 0, 1)), class = "data.frame", row.names = c(NA,
-20L), variable.labels = structure(character(0), .Names = character(0)), codepage = 65001L)
视觉上,上面是:
╔═════════════╗
║ observation ║
╠═════════════╣
║ 1 ║
╠═════════════╣
║ 0 ║
╠═════════════╣
║ 0 ║
╠═════════════╣
║ 0 ║
╠═════════════╣
║ 0 ║
╠═════════════╣
║ 0 ║
╠═════════════╣
║ 1 ║
╠═════════════╣
║ 0 ║
╠═════════════╣
║ 0 ║
╠═════════════╣
║ 1 ║
╠═════════════╣
║ 0 ║
╠═════════════╣
║ 0 ║
╠═════════════╣
║ 0 ║
╠═════════════╣
║ 0 ║
╠═════════════╣
║ 0 ║
╠═════════════╣
║ 0 ║
╠═════════════╣
║ 0 ║
╠═════════════╣
║ 0 ║
╠═════════════╣
║ 0 ║
╠═════════════╣
║ 1 ║
╚═════════════╝
最终目标是根据 1 的计数和平均值按指定的行数(例如,下面示例输出中的 10 行)进行聚合。输出看起来像:
╔═══════╦══════╗
║ count ║ mean ║
╠═══════╬══════╣
║ 3 ║ 0.3 ║
╠═══════╬══════╣
║ 1 ║ 0.1 ║
╚═══════╩══════╝
您可以试试下面的代码
do.call(
rbind,
tapply(
df$observation,
ceiling(seq(nrow(df)) / 10),
function(x) data.frame(count = sum(x), mean = mean(x))
)
)
这给出了
count mean
1 3 0.3
2 1 0.1
一个tidyverse
的解决方案。根据 row_number
:
library(tidyverse)
d %>%
mutate(rn = cumsum(row_number() %% 10 == 1)) %>%
group_by(rn) %>%
summarise(count = sum(observation),
mean = mean(observation))
rn count mean
<int> <dbl> <dbl>
1 1 3 0.3
2 2 1 0.1
使用data.table
library(data.table)
setDT(df1)[, .(count = sum(observation), mean = mean(observation)),
.(grp = as.integer(gl(nrow(df1), 10, nrow(df1))))][, grp := NULL][]
-输出
# count mean
#1: 3 0.3
#2: 1 0.1