具有滞后值的简单操作

Simple operation with lagged values

我需要使用滞后值计算逐行简单运算,例如前 x 年变量的总和

我试过了:

toy %>% 
  group_by(student) %>% 
  mutate(lag_passed = sum(lag(passed, n = 5, order_by = year, default = 0)))

toy %>% 
  group_by(student) %>% 
  arrange(year) %>% 
  mutate(lag_passed = lapply(passed, function(x) sum(lag(x, n = 5, default = 0))))

可重现的例子。任务总和过去五年通过的测试数量。

toy <- data.frame(student = rep("A",10),
year=c(1:10), 
passed=c(0,0,0,1,2,0,0,0,0,1))

   student year passed
1        A    1      0
2        A    2      0
3        A    3      0
4        A    4      1
5        A    5      2
6        A    6      0
7        A    7      0
8        A    8      0
9        A    9      0
10       A   10      1

expected <- data.frame(student = rep("A",10),
year=c(1:10), 
passed=c(0,0,0,1,2,0,0,1,0,1), 
lag_passed=c(0,0,0,0,1,3,3,3,4,3))

   student year passed lag_passed
1        A    1      0          0
2        A    2      0          0
3        A    3      0          0
4        A    4      1          0
5        A    5      2          1
6        A    6      0          3
7        A    7      0          3
8        A    8      1          3
9        A    9      0          4
10       A   10      1          3


runner::sum_run() 会在这里提供帮助。使用 idx = year 是可选的,除非您在某些年份中有缺失值,在这种情况下,它也会考虑那些缺失的年份,但是样本数据并非如此。添加了 student 分组,因为实际上您可能希望对每个学生进行操作。

toy <- data.frame(student = rep("A",10),
                  year=c(1:10), 
                  passed=c(0,0,0,1,2,0,0,1,0,1))
library(dplyr)

library(runner)
toy %>% group_by(student) %>%
  mutate(lag_passed = sum_run(x = passed,
                             idx = year,
                             k = 5,
                             lag = 1))
#> # A tibble: 10 x 4
#> # Groups:   student [1]
#>    student  year passed lag_passed
#>    <chr>   <int>  <dbl>      <dbl>
#>  1 A           1      0         NA
#>  2 A           2      0          0
#>  3 A           3      0          0
#>  4 A           4      1          0
#>  5 A           5      2          1
#>  6 A           6      0          3
#>  7 A           7      0          3
#>  8 A           8      1          3
#>  9 A           9      0          4
#> 10 A          10      1          3

reprex package (v2.0.0)

于 2021-05-15 创建

zoo::rollapply的另一个滚动总和解决方案:

f <- function(x) {zoo::rollapply(x, 6, sum, align = 'right', partial = TRUE) - x}

expected %>% 
    group_by(student) %>% 
    arrange(year) %>% 
    mutate(lag_passed2 = f(passed)) %>%
    ungroup()

#    student  year passed lag_passed lag_passed2
#    <chr>   <int>  <dbl>      <dbl>       <dbl>
#  1 A           1      0          0           0
#  2 A           2      0          0           0
#  3 A           3      0          0           0
#  4 A           4      1          0           0
#  5 A           5      2          1           1
#  6 A           6      0          3           3
#  7 A           7      0          3           3
#  8 A           8      1          3           3
#  9 A           9      0          4           4
# 10 A          10      1          3           3
使用辅助函数创建的

lag_passed2与lag_passed相同。这个想法是计算一个 window 长度为 6 的滑动 window 总和(允许部分 window 开始于 partial = Talign = 'right'),然后减去passed 当前年份的值。


注意:辅助函数 f 可以通过使用偏移量和默认右对齐指定 window 来替换为更简单的函数,正如@G 指出的那样。格洛腾迪克:

f <- function(x) rollapplyr(x, list(-seq(5)), sum, partial = TRUE, fill = 0)