dplyr 中的滚动总和
Rolling sum in dplyr
set.seed(123)
df <- data.frame(x = sample(1:10, 20, replace = T), id = rep(1:2, each = 10))
对于每个 id
,我想创建一个列,其中包含前 5 个 x
值的总和。
df %>% group_by(id) %>% mutate(roll.sum = c(x[1:4], zoo::rollapply(x, 5, sum)))
# Groups: id [2]
x id roll.sum
<int> <int> <int>
3 1 3
8 1 8
5 1 5
9 1 9
10 1 10
1 1 36
6 1 39
9 1 40
6 1 41
5 1 37
10 2 10
5 2 5
7 2 7
6 2 6
2 2 2
9 2 39
3 2 32
1 2 28
4 2 25
10 2 29
第6行应该是35 (3 + 8 + 5 + 9 + 10)
,第7行应该是33 (8 + 5 + 9 + 10 + 1)
等等。
但是,上述函数还包括行本身进行计算。我该如何解决?
tibbletime
包中有 rollify
函数,您可以使用。你可以在这个小插图中读到它:Rolling calculations in tibbletime.
library(tibbletime)
library(dplyr)
rollig_sum <- rollify(.f = sum, window = 5)
df %>%
group_by(id) %>%
mutate(roll.sum = lag(rollig_sum(x))) #added lag() here
# A tibble: 20 x 3
# Groups: id [2]
# x id roll.sum
# <int> <int> <int>
# 1 3 1 NA
# 2 8 1 NA
# 3 5 1 NA
# 4 9 1 NA
# 5 10 1 NA
# 6 1 1 35
# 7 6 1 33
# 8 9 1 31
# 9 6 1 35
#10 5 1 32
#11 10 2 NA
#12 5 2 NA
#13 7 2 NA
#14 6 2 NA
#15 2 2 NA
#16 9 2 30
#17 3 2 29
#18 1 2 27
#19 4 2 21
#20 10 2 19
如果您希望 NA
为其他值,您可以使用 if_else
df %>%
group_by(id) %>%
mutate(roll.sum = lag(rollig_sum(x))) %>%
mutate(roll.sum = if_else(is.na(roll.sum), x, roll.sum))
library(zoo)
df %>% group_by(id) %>%
mutate(Sum_prev = rollapply(x, list(-(1:5)), sum, fill=NA, align = "right", partial=F))
#you can use rollapply(x, list((1:5)), sum, fill=NA, align = "left", partial=F)
#to sum the next 5 elements scaping the current one
x id Sum_prev
1 3 1 NA
2 8 1 NA
3 5 1 NA
4 9 1 NA
5 10 1 NA
6 1 1 35
7 6 1 33
8 9 1 31
9 6 1 35
10 5 1 32
11 10 2 NA
12 5 2 NA
13 7 2 NA
14 6 2 NA
15 2 2 NA
16 9 2 30
17 3 2 29
18 1 2 27
19 4 2 21
20 10 2 19
set.seed(123)
df <- data.frame(x = sample(1:10, 20, replace = T), id = rep(1:2, each = 10))
对于每个 id
,我想创建一个列,其中包含前 5 个 x
值的总和。
df %>% group_by(id) %>% mutate(roll.sum = c(x[1:4], zoo::rollapply(x, 5, sum)))
# Groups: id [2]
x id roll.sum
<int> <int> <int>
3 1 3
8 1 8
5 1 5
9 1 9
10 1 10
1 1 36
6 1 39
9 1 40
6 1 41
5 1 37
10 2 10
5 2 5
7 2 7
6 2 6
2 2 2
9 2 39
3 2 32
1 2 28
4 2 25
10 2 29
第6行应该是35 (3 + 8 + 5 + 9 + 10)
,第7行应该是33 (8 + 5 + 9 + 10 + 1)
等等。
但是,上述函数还包括行本身进行计算。我该如何解决?
tibbletime
包中有 rollify
函数,您可以使用。你可以在这个小插图中读到它:Rolling calculations in tibbletime.
library(tibbletime)
library(dplyr)
rollig_sum <- rollify(.f = sum, window = 5)
df %>%
group_by(id) %>%
mutate(roll.sum = lag(rollig_sum(x))) #added lag() here
# A tibble: 20 x 3
# Groups: id [2]
# x id roll.sum
# <int> <int> <int>
# 1 3 1 NA
# 2 8 1 NA
# 3 5 1 NA
# 4 9 1 NA
# 5 10 1 NA
# 6 1 1 35
# 7 6 1 33
# 8 9 1 31
# 9 6 1 35
#10 5 1 32
#11 10 2 NA
#12 5 2 NA
#13 7 2 NA
#14 6 2 NA
#15 2 2 NA
#16 9 2 30
#17 3 2 29
#18 1 2 27
#19 4 2 21
#20 10 2 19
如果您希望 NA
为其他值,您可以使用 if_else
df %>%
group_by(id) %>%
mutate(roll.sum = lag(rollig_sum(x))) %>%
mutate(roll.sum = if_else(is.na(roll.sum), x, roll.sum))
library(zoo)
df %>% group_by(id) %>%
mutate(Sum_prev = rollapply(x, list(-(1:5)), sum, fill=NA, align = "right", partial=F))
#you can use rollapply(x, list((1:5)), sum, fill=NA, align = "left", partial=F)
#to sum the next 5 elements scaping the current one
x id Sum_prev
1 3 1 NA
2 8 1 NA
3 5 1 NA
4 9 1 NA
5 10 1 NA
6 1 1 35
7 6 1 33
8 9 1 31
9 6 1 35
10 5 1 32
11 10 2 NA
12 5 2 NA
13 7 2 NA
14 6 2 NA
15 2 2 NA
16 9 2 30
17 3 2 29
18 1 2 27
19 4 2 21
20 10 2 19