具有滞后值的简单操作
Simple operation with lagged values
我需要使用滞后值计算逐行简单运算,例如前 x 年变量的总和
我试过了:
toy %>%
group_by(student) %>%
mutate(lag_passed = sum(lag(passed, n = 5, order_by = year, default = 0)))
toy %>%
group_by(student) %>%
arrange(year) %>%
mutate(lag_passed = lapply(passed, function(x) sum(lag(x, n = 5, default = 0))))
可重现的例子。任务总和过去五年通过的测试数量。
toy <- data.frame(student = rep("A",10),
year=c(1:10),
passed=c(0,0,0,1,2,0,0,0,0,1))
student year passed
1 A 1 0
2 A 2 0
3 A 3 0
4 A 4 1
5 A 5 2
6 A 6 0
7 A 7 0
8 A 8 0
9 A 9 0
10 A 10 1
expected <- data.frame(student = rep("A",10),
year=c(1:10),
passed=c(0,0,0,1,2,0,0,1,0,1),
lag_passed=c(0,0,0,0,1,3,3,3,4,3))
student year passed lag_passed
1 A 1 0 0
2 A 2 0 0
3 A 3 0 0
4 A 4 1 0
5 A 5 2 1
6 A 6 0 3
7 A 7 0 3
8 A 8 1 3
9 A 9 0 4
10 A 10 1 3
runner::sum_run()
会在这里提供帮助。使用 idx = year
是可选的,除非您在某些年份中有缺失值,在这种情况下,它也会考虑那些缺失的年份,但是样本数据并非如此。添加了 student
分组,因为实际上您可能希望对每个学生进行操作。
toy <- data.frame(student = rep("A",10),
year=c(1:10),
passed=c(0,0,0,1,2,0,0,1,0,1))
library(dplyr)
library(runner)
toy %>% group_by(student) %>%
mutate(lag_passed = sum_run(x = passed,
idx = year,
k = 5,
lag = 1))
#> # A tibble: 10 x 4
#> # Groups: student [1]
#> student year passed lag_passed
#> <chr> <int> <dbl> <dbl>
#> 1 A 1 0 NA
#> 2 A 2 0 0
#> 3 A 3 0 0
#> 4 A 4 1 0
#> 5 A 5 2 1
#> 6 A 6 0 3
#> 7 A 7 0 3
#> 8 A 8 1 3
#> 9 A 9 0 4
#> 10 A 10 1 3
由 reprex package (v2.0.0)
于 2021-05-15 创建
zoo::rollapply
的另一个滚动总和解决方案:
f <- function(x) {zoo::rollapply(x, 6, sum, align = 'right', partial = TRUE) - x}
expected %>%
group_by(student) %>%
arrange(year) %>%
mutate(lag_passed2 = f(passed)) %>%
ungroup()
# student year passed lag_passed lag_passed2
# <chr> <int> <dbl> <dbl> <dbl>
# 1 A 1 0 0 0
# 2 A 2 0 0 0
# 3 A 3 0 0 0
# 4 A 4 1 0 0
# 5 A 5 2 1 1
# 6 A 6 0 3 3
# 7 A 7 0 3 3
# 8 A 8 1 3 3
# 9 A 9 0 4 4
# 10 A 10 1 3 3
使用辅助函数创建的lag_passed2与lag_passed
相同。这个想法是计算一个 window 长度为 6 的滑动 window 总和(允许部分 window 开始于 partial = T
和 align = 'right'
),然后减去passed
当前年份的值。
注意:辅助函数 f
可以通过使用偏移量和默认右对齐指定 window 来替换为更简单的函数,正如@G 指出的那样。格洛腾迪克:
f <- function(x) rollapplyr(x, list(-seq(5)), sum, partial = TRUE, fill = 0)
我需要使用滞后值计算逐行简单运算,例如前 x 年变量的总和
我试过了:
toy %>%
group_by(student) %>%
mutate(lag_passed = sum(lag(passed, n = 5, order_by = year, default = 0)))
toy %>%
group_by(student) %>%
arrange(year) %>%
mutate(lag_passed = lapply(passed, function(x) sum(lag(x, n = 5, default = 0))))
可重现的例子。任务总和过去五年通过的测试数量。
toy <- data.frame(student = rep("A",10),
year=c(1:10),
passed=c(0,0,0,1,2,0,0,0,0,1))
student year passed
1 A 1 0
2 A 2 0
3 A 3 0
4 A 4 1
5 A 5 2
6 A 6 0
7 A 7 0
8 A 8 0
9 A 9 0
10 A 10 1
expected <- data.frame(student = rep("A",10),
year=c(1:10),
passed=c(0,0,0,1,2,0,0,1,0,1),
lag_passed=c(0,0,0,0,1,3,3,3,4,3))
student year passed lag_passed
1 A 1 0 0
2 A 2 0 0
3 A 3 0 0
4 A 4 1 0
5 A 5 2 1
6 A 6 0 3
7 A 7 0 3
8 A 8 1 3
9 A 9 0 4
10 A 10 1 3
runner::sum_run()
会在这里提供帮助。使用 idx = year
是可选的,除非您在某些年份中有缺失值,在这种情况下,它也会考虑那些缺失的年份,但是样本数据并非如此。添加了 student
分组,因为实际上您可能希望对每个学生进行操作。
toy <- data.frame(student = rep("A",10),
year=c(1:10),
passed=c(0,0,0,1,2,0,0,1,0,1))
library(dplyr)
library(runner)
toy %>% group_by(student) %>%
mutate(lag_passed = sum_run(x = passed,
idx = year,
k = 5,
lag = 1))
#> # A tibble: 10 x 4
#> # Groups: student [1]
#> student year passed lag_passed
#> <chr> <int> <dbl> <dbl>
#> 1 A 1 0 NA
#> 2 A 2 0 0
#> 3 A 3 0 0
#> 4 A 4 1 0
#> 5 A 5 2 1
#> 6 A 6 0 3
#> 7 A 7 0 3
#> 8 A 8 1 3
#> 9 A 9 0 4
#> 10 A 10 1 3
由 reprex package (v2.0.0)
于 2021-05-15 创建zoo::rollapply
的另一个滚动总和解决方案:
f <- function(x) {zoo::rollapply(x, 6, sum, align = 'right', partial = TRUE) - x}
expected %>%
group_by(student) %>%
arrange(year) %>%
mutate(lag_passed2 = f(passed)) %>%
ungroup()
# student year passed lag_passed lag_passed2
# <chr> <int> <dbl> <dbl> <dbl>
# 1 A 1 0 0 0
# 2 A 2 0 0 0
# 3 A 3 0 0 0
# 4 A 4 1 0 0
# 5 A 5 2 1 1
# 6 A 6 0 3 3
# 7 A 7 0 3 3
# 8 A 8 1 3 3
# 9 A 9 0 4 4
# 10 A 10 1 3 3
使用辅助函数创建的lag_passed2与lag_passed
相同。这个想法是计算一个 window 长度为 6 的滑动 window 总和(允许部分 window 开始于 partial = T
和 align = 'right'
),然后减去passed
当前年份的值。
注意:辅助函数 f
可以通过使用偏移量和默认右对齐指定 window 来替换为更简单的函数,正如@G 指出的那样。格洛腾迪克:
f <- function(x) rollapplyr(x, list(-seq(5)), sum, partial = TRUE, fill = 0)