前瞻性滚动中的元素总和 window 按月

Sum of elements in a forward looking rolling window by month

我有以下 data.frame 列:Id、Month、have

library(dplyr)

dt <- read.table(header = TRUE, text = '
Id  Month    have   want
1   01-Jan-2018  1.000000000000000   1.234567901220000 
1   01-Feb-2018  0.200000000000000   0.234567901233000 
1   01-Mar-2018  0.030000000000000   0.034567901234400 
1   01-Apr-2018  0.004000000000000   0.004567901234550 
1   01-May-2018  0.000500000000000   0.000567901234566 
1   01-Jun-2018  0.000060000000000   0.000067901234566 
1   01-Jul-2018  0.000007000000000   0.000007901234566 
1   01-Aug-2018  0.000000800000000   0.000000901234566 
1   01-Sep-2018  0.000000090000000   0.000000101234566 
1   01-Oct-2018  0.000000010000000   0.000000011234566 
1   01-Nov-2018  0.000000001100000   0.000000001234566 
1   01-Dec-2018  0.000000000120000   0.000000000134566 
1   01-Jan-2019  0.000000000013000   0.000000000014566 
1   01-Feb-2019  0.000000000001400   0.000000000001566 
1   01-Mar-2019  0.000000000000150   0.000000000000166 
1   01-Apr-2019  0.000000000000016   0.000000000000016 
2   01-Jan-2018 1337.00 1338.00
2   01-Feb-2018 1.00    1.00
3   01-Jan-2018  5.000000000000000000    5.000000000000000 
') %>% mutate(Month=as.Date(Month, format='%d-%b-%Y')

我想以编程方式计算 12 个月前瞻性滚动 window 中元素的总和,按月份 Id 分组,如第 want 列所示。如果滚动观察window小于12个月,则忽略缺失的元素。

对于奖励积分,解决方案还允许缺少月份,例如:

dt <- read.table(header = TRUE, text = '
Id  Month    have   want
1   01-Jan-18    1.000000000000000   1.200000000000000 
1   01-Dec-18    0.200000000000000   0.230000000000000 
1   01-Jan-19    0.030000000000000   0.030000000000000 
') %>% mutate(Month=as.Date(Month, format='%d-%b-%Y')

我尝试过不同的解决方案,例如zoo包的rollapplyr()和runner包中的一些函数,但是好像没有给我我需要的。

您可以将 zoorollaplypartial = TRUE

一起使用
library(dplyr)

dt %>%
  group_by(Id) %>%
  tidyr::complete(Month = seq(min(Month), max(Month), "month")) %>%
  mutate(result = zoo::rollapply(have, 12, sum, na.rm = TRUE, 
                                 align = 'left', partial = TRUE)) -> result
result

如果您每个月都有每个 Id 的数据,就像共享示例中那样,您可以删除 complete 步骤。

我建议在这种情况下使用 runner 包。 runner 函数让你计算滚动 window 在时间上有完全的控制。 k 是 window 长度,lag 是 window 的滞后,在 idx 中指定 window 所依赖的索引列。

library(runner)

dt %>%
  group_by(Id) %>%
  mutate(want2 = runner(
    .,
    f = function(x) sum(x$have),
    k = 12,    # or "12 months"
    lag = -11, # or "-11 months"
    idx = Month)
  )
# # A tibble: 19 x 5
# # Groups:   Id [3]
#    Id Month          have     want    want2
# <int> <date>        <dbl>    <dbl>    <dbl>
#   1     1 2018-01-01 1.00e+ 0 1.23e+ 0 1.00e+ 0
#   2     1 2018-02-01 2.00e- 1 2.35e- 1 2.00e- 1
#   3     1 2018-03-01 3.00e- 2 3.46e- 2 3.00e- 2
#   4     1 2018-04-01 4.00e- 3 4.57e- 3 4.00e- 3
#   5     1 2018-05-01 5.00e- 4 5.68e- 4 5.00e- 4
#   6     1 2018-06-01 6.00e- 5 6.79e- 5 6.00e- 5