dplyr 组的滞后差异
lagged difference by group with dplyr
我有以下数据集
Amount1 Amount2 Date Group
1 NA 350 2019-01-01 A
2 NA 335 2019-01-01 B
3 NA 340 2019-01-01 C
4 300 365 2019-01-06 A
5 310 325 2019-01-06 B
6 285 355 2019-01-06 C
7 310 335 2019-01-11 A
8 305 355 2019-01-11 B
9 335 360 2019-01-11 C
10 280 NA 2019-01-16 A
11 290 NA 2019-01-16 B
12 240 NA 2019-01-16 C
你可以用这个重新创建
> dput(test)
structure(list(Amount1 = c(NA, NA, NA, 300, 310, 285, 310, 305, 335, 280, 290, 240),
Amount2 = c(350, 335, 340, 365, 325, 355, 335, 355, 360, NA, NA, NA),
Date = structure(c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 4L, 4L, 4L), .Label = c("2019-01-01", "2019-01-06", "2019-01-11", "2019-01-16"), class = "factor"),
Group = structure(c(1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L), .Label = c("A", "B", "C"), class = "factor")),
row.names = c(NA, -12L), class = "data.frame")
我想为每个组从前一个 Amount2
中减去 Amount1
。
例如,对于 A 组,我有:
2019-01-01 -> NA
2019-01-06 -> 350 - 300 = 50
2019-01-11 -> 365 - 310 = 55
2019-01-16 -> 335 - 280 = 55
我该怎么做?我尝试使用 mutate_at
但没有成功...
# Does not work...
test %>%
group_by(Group, Amount2) %>%
mutate_at(c("Amount1"), funs(AmountDiff = . - lag(Amount2, 1)))
这个怎么样?
test %>%
group_by(Group) %>%
mutate(Amount_diff = lag(Amount2) - Amount1)
即:
A tibble: 12 x 5
# Groups: Group [3]
Amount1 Amount2 Date Group Amount_diff
<dbl> <dbl> <fct> <fct> <dbl>
1 NA 350 2019-01-01 A NA
2 NA 335 2019-01-01 B NA
3 NA 340 2019-01-01 C NA
4 300 365 2019-01-06 A 50
5 310 325 2019-01-06 B 25
6 285 355 2019-01-06 C 55
7 310 335 2019-01-11 A 55
8 305 355 2019-01-11 B 20
9 335 360 2019-01-11 C 20
10 280 NA 2019-01-16 A 55
11 290 NA 2019-01-16 B 65
12 240 NA 2019-01-16 C 120
对于 A 组:
test %>%
group_by(Group) %>%
mutate(Amount_diff = lag(Amount2) - Amount1) %>%
filter(Group == "A")
是:
# A tibble: 4 x 5
# Groups: Group [1]
Amount1 Amount2 Date Group Amount_diff
<dbl> <dbl> <fct> <fct> <dbl>
1 NA 350 2019-01-01 A NA
2 300 365 2019-01-06 A 50
3 310 335 2019-01-11 A 55
4 280 NA 2019-01-16 A 55
我有以下数据集
Amount1 Amount2 Date Group
1 NA 350 2019-01-01 A
2 NA 335 2019-01-01 B
3 NA 340 2019-01-01 C
4 300 365 2019-01-06 A
5 310 325 2019-01-06 B
6 285 355 2019-01-06 C
7 310 335 2019-01-11 A
8 305 355 2019-01-11 B
9 335 360 2019-01-11 C
10 280 NA 2019-01-16 A
11 290 NA 2019-01-16 B
12 240 NA 2019-01-16 C
你可以用这个重新创建
> dput(test)
structure(list(Amount1 = c(NA, NA, NA, 300, 310, 285, 310, 305, 335, 280, 290, 240),
Amount2 = c(350, 335, 340, 365, 325, 355, 335, 355, 360, NA, NA, NA),
Date = structure(c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 4L, 4L, 4L), .Label = c("2019-01-01", "2019-01-06", "2019-01-11", "2019-01-16"), class = "factor"),
Group = structure(c(1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L), .Label = c("A", "B", "C"), class = "factor")),
row.names = c(NA, -12L), class = "data.frame")
我想为每个组从前一个 Amount2
中减去 Amount1
。
例如,对于 A 组,我有:
2019-01-01 -> NA
2019-01-06 -> 350 - 300 = 50
2019-01-11 -> 365 - 310 = 55
2019-01-16 -> 335 - 280 = 55
我该怎么做?我尝试使用 mutate_at
但没有成功...
# Does not work...
test %>%
group_by(Group, Amount2) %>%
mutate_at(c("Amount1"), funs(AmountDiff = . - lag(Amount2, 1)))
这个怎么样?
test %>%
group_by(Group) %>%
mutate(Amount_diff = lag(Amount2) - Amount1)
即:
A tibble: 12 x 5
# Groups: Group [3]
Amount1 Amount2 Date Group Amount_diff
<dbl> <dbl> <fct> <fct> <dbl>
1 NA 350 2019-01-01 A NA
2 NA 335 2019-01-01 B NA
3 NA 340 2019-01-01 C NA
4 300 365 2019-01-06 A 50
5 310 325 2019-01-06 B 25
6 285 355 2019-01-06 C 55
7 310 335 2019-01-11 A 55
8 305 355 2019-01-11 B 20
9 335 360 2019-01-11 C 20
10 280 NA 2019-01-16 A 55
11 290 NA 2019-01-16 B 65
12 240 NA 2019-01-16 C 120
对于 A 组:
test %>%
group_by(Group) %>%
mutate(Amount_diff = lag(Amount2) - Amount1) %>%
filter(Group == "A")
是:
# A tibble: 4 x 5
# Groups: Group [1]
Amount1 Amount2 Date Group Amount_diff
<dbl> <dbl> <fct> <fct> <dbl>
1 NA 350 2019-01-01 A NA
2 300 365 2019-01-06 A 50
3 310 335 2019-01-11 A 55
4 280 NA 2019-01-16 A 55