R - 如何对长数据 ID 中的每 n 行进行平均?
R - How to average every n row within ID with long data?
我有一个长格式的大型数据框,其中包含大约 13k 行。它主要包含数字但也包含一些字符串变量。这是它的外观示例数据。
id <- c(101,101,101,101,101,101,101,101,101,
102,102,102,102,102,102,
103,103,103,103,103,103,103,103,103,103,103)
color <- c("red","red","red","red","red","red","red","red","red",
"blue","blue","blue","blue","blue","blue",
"green","green","green","green","green","green","green","green","green","green","green")
time <- c(1:9, 1:6, 1:11)
var1 <- sample(1:3, 26, replace=TRUE)
var2 <- sample(1:3, 26, replace=TRUE)
var3 <- sample(1:3, 26, replace=TRUE)
df <- data.frame(id,color,time,var1,var2,var3)
id color time var1 var2 var3
1 101 red 1 1 1 3
2 101 red 2 1 3 1
3 101 red 3 1 3 1
4 101 red 4 1 3 3
5 101 red 5 2 2 3
6 101 red 6 1 1 2
7 101 red 7 1 3 1
8 101 red 8 1 2 1
9 101 red 9 1 2 2
10 102 blue 1 1 1 1
11 102 blue 2 1 3 2
12 102 blue 3 1 1 1
13 102 blue 4 3 2 1
14 102 blue 5 1 3 2
15 102 blue 6 2 1 1
16 103 green 1 3 1 3
17 103 green 2 2 3 2
18 103 green 3 2 3 1
19 103 green 4 1 1 3
20 103 green 5 2 3 2
21 103 green 6 3 3 2
22 103 green 7 3 2 3
23 103 green 8 3 1 2
24 103 green 9 3 1 2
25 103 green 10 3 2 1
26 103 green 11 1 2 1
我想让这个时间序列数据帧更平滑,并平均每 n 行(例如每 3 行)- 平均 time
、var1
、var2
和 var3
在 id
内,id 和颜色不受影响。最终结果现在应该有每 3 行原始行 1 行。然而,到目前为止我想到的唯一解决方案是将数据帧列表中的数据帧按 id
拆分,然后对每个新数据帧使用聚合,但这对我的真实数据不切实际并导致颜色变量成为 NA
。其他尝试会导致错误,因为我选择的 n 并不总是对行进行干净的平均(例如,当 id
103 总共有 11 行时,试图每 3 行平均一次)。有没有更实用的方案?
我们可以创建一个分组列 gl
以及 'id'、'color' 和 summarise
across
starts_with
'var'
library(dplyr)
df %>%
group_by(id, color, grp = as.integer(gl(n(), 3, n()))) %>%
summarise(across(starts_with('var'), mean), .groups = 'drop')
我有一个长格式的大型数据框,其中包含大约 13k 行。它主要包含数字但也包含一些字符串变量。这是它的外观示例数据。
id <- c(101,101,101,101,101,101,101,101,101,
102,102,102,102,102,102,
103,103,103,103,103,103,103,103,103,103,103)
color <- c("red","red","red","red","red","red","red","red","red",
"blue","blue","blue","blue","blue","blue",
"green","green","green","green","green","green","green","green","green","green","green")
time <- c(1:9, 1:6, 1:11)
var1 <- sample(1:3, 26, replace=TRUE)
var2 <- sample(1:3, 26, replace=TRUE)
var3 <- sample(1:3, 26, replace=TRUE)
df <- data.frame(id,color,time,var1,var2,var3)
id color time var1 var2 var3
1 101 red 1 1 1 3
2 101 red 2 1 3 1
3 101 red 3 1 3 1
4 101 red 4 1 3 3
5 101 red 5 2 2 3
6 101 red 6 1 1 2
7 101 red 7 1 3 1
8 101 red 8 1 2 1
9 101 red 9 1 2 2
10 102 blue 1 1 1 1
11 102 blue 2 1 3 2
12 102 blue 3 1 1 1
13 102 blue 4 3 2 1
14 102 blue 5 1 3 2
15 102 blue 6 2 1 1
16 103 green 1 3 1 3
17 103 green 2 2 3 2
18 103 green 3 2 3 1
19 103 green 4 1 1 3
20 103 green 5 2 3 2
21 103 green 6 3 3 2
22 103 green 7 3 2 3
23 103 green 8 3 1 2
24 103 green 9 3 1 2
25 103 green 10 3 2 1
26 103 green 11 1 2 1
我想让这个时间序列数据帧更平滑,并平均每 n 行(例如每 3 行)- 平均 time
、var1
、var2
和 var3
在 id
内,id 和颜色不受影响。最终结果现在应该有每 3 行原始行 1 行。然而,到目前为止我想到的唯一解决方案是将数据帧列表中的数据帧按 id
拆分,然后对每个新数据帧使用聚合,但这对我的真实数据不切实际并导致颜色变量成为 NA
。其他尝试会导致错误,因为我选择的 n 并不总是对行进行干净的平均(例如,当 id
103 总共有 11 行时,试图每 3 行平均一次)。有没有更实用的方案?
我们可以创建一个分组列 gl
以及 'id'、'color' 和 summarise
across
starts_with
'var'
library(dplyr)
df %>%
group_by(id, color, grp = as.integer(gl(n(), 3, n()))) %>%
summarise(across(starts_with('var'), mean), .groups = 'drop')