按一列分组并使用R计算月度、季度混合数据当前期间值与前一个的滞后差
Groupby one column and calculate lag difference of monthly, quarterly mixed data's current period values with previous one using R
假设我有如下面板数据,它是从编辑的:
df <- structure(list(id = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("M01",
"M02", "S01"), class = "factor"), date = structure(c(2L, 3L,
4L, 5L, 6L, 7L, 8L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 1L, 2L, 3L, 4L,
5L, 6L, 7L, 8L), .Label = c("2020-12", "2021-01", "2021-02",
"2021-03", "2021-04", "2021-05", "2021-06", "2021-07"), class = "factor"),
actual = c(3.4, 5.4, 7.4, 7.4, 7.5, 8, 8.9, 10.8, 10.1, 8.2,
10.1, 9.4, 10.1, 9.4, -0.3, NA, NA, 8.6, NA, NA, 8.3, NA),
pred = c(3.288889774, 5.819407687, 6.705608369, 6.054457292,
5.582409131, 7.01052472, 9.742902434, 10.98571396, 6.522003651,
9.688977242, 10.39801463, 9.398991615, 9.764616936, 9.855033457,
0.493311422, 8.403722942, 8.174854517, 8.573117852, 8.403065801,
8.684289455, 8.719079247, 8.259439468)), class = "data.frame", row.names = c(NA,
-22L))
在groupbyid
之后,对于每个月的实际值和预测值,我需要用上个月的实际值计算当月的实际值和预测值,除了:假设id=='S01'
,它是季度数据而不是月度数据,因此我需要计算当前月份与上一季度上个月实际值的差值,即 2021-03 年与 2020-12 年而不是 2021-02 年,其他月份的逻辑相同。
我的尝试代码:
df %>%
group_by(id) %>%
mutate(actual2=actual) %>%
fill(actual2) %>%
mutate(act_diff = case_when(
actual2 > lag(actual2) ~ actual2 - lag(actual2),
actual2 < lag(actual2) ~ actual2 - lag(actual2),
actual2 == lag(actual2) ~ 0),
pred_diff = case_when(
pred > lag(actual2) ~ pred - lag(actual2),
pred < lag(actual2) ~ pred - lag(actual2),
pred == lag(actual2) ~ 0),
act_diff = ifelse((id=='S01')&is.na(actual), NA, act_diff),
pred_diff = ifelse((id=='S01')&is.na(actual), NA, pred_diff), actual2=NULL) %>%
print(n=22)
结果:
id date actual pred act_diff pred_diff
<fct> <fct> <dbl> <dbl> <dbl> <dbl>
1 M01 2021-01 3.4 3.29 NA NA
2 M01 2021-02 5.4 5.82 2 2.42
3 M01 2021-03 7.4 6.71 2 1.31
4 M01 2021-04 7.4 6.05 0 -1.35
5 M01 2021-05 7.5 5.58 0.100 -1.82
6 M01 2021-06 8 7.01 0.5 -0.489
7 M01 2021-07 8.9 9.74 0.9 1.74
8 M02 2021-01 10.8 11.0 NA NA
9 M02 2021-02 10.1 6.52 -0.700 -4.28
10 M02 2021-03 8.2 9.69 -1.9 -0.411
11 M02 2021-04 10.1 10.4 1.9 2.20
12 M02 2021-05 9.4 9.40 -0.700 -0.701
13 M02 2021-06 10.1 9.76 0.700 0.365
14 M02 2021-07 9.4 9.86 -0.700 -0.245
15 S01 2020-12 -0.3 0.493 NA NA
16 S01 2021-01 NA 8.40 NA NA
17 S01 2021-02 NA 8.17 NA NA
18 S01 2021-03 8.6 8.57 8.9 8.87
19 S01 2021-04 NA 8.40 NA NA
20 S01 2021-05 NA 8.68 NA NA
21 S01 2021-06 8.3 8.72 -0.300 0.119
22 S01 2021-07 NA 8.26 NA NA
我如何修改上面的代码以获得这样的预期结果?感谢:
id date actual pred act_diff pred_diff
1 M01 2021-01 3.4 3.2888898 NA NA
2 M01 2021-02 5.4 5.8194077 2.0 2.4194077
3 M01 2021-03 7.4 6.7056084 2.0 1.3056084
4 M01 2021-04 7.4 6.0544573 0.0 -1.3455427
5 M01 2021-05 7.5 5.5824091 0.1 -1.8175909
6 M01 2021-06 8.0 7.0105247 0.5 -0.4894753
7 M01 2021-07 8.9 9.7429024 0.9 1.7429024
8 M02 2021-01 10.8 10.9857140 NA NA
9 M02 2021-02 10.1 6.5220037 -0.7 -4.2779963
10 M02 2021-03 8.2 9.6889772 -1.9 -0.4110228
11 M02 2021-04 10.1 10.3980146 1.9 2.1980146
12 M02 2021-05 9.4 9.3989916 -0.7 -0.7010084
13 M02 2021-06 10.1 9.7646169 0.7 0.3646169
14 M02 2021-07 9.4 9.8550335 -0.7 -0.2449665
15 S01 2020-12 -0.3 0.4933114 NA NA
16 S01 2021-01 NA 8.4037229 NA 8.7037229 # calculate with S01's actual value in 2020-12
17 S01 2021-02 NA 8.1748545 NA 8.4748545 # calculate with S01's actual value in 2020-12
18 S01 2021-03 8.6 8.5731179 8.9 8.8731179 # calculate with S01's actual value in 2020-12
19 S01 2021-04 NA 8.4030658 NA -0.1969342 # calculate with S01's actual value in 2021-03
20 S01 2021-05 NA 8.6842895 NA 0.0842895 # calculate with S01's actual value in 2021-03
21 S01 2021-06 8.3 8.7190792 -0.3 0.1190792 # calculate with S01's actual value in 2021-03
22 S01 2021-07 NA 8.2594395 NA -0.0405605 # calculate with S01's actual value in 2021-06
大概是这样的?
df %>%
group_by(id) %>%
mutate(actual_fill = actual) %>%
fill(actual_fill) %>%
mutate(act_diff = actual - if_else(id == "S01", lag(actual_fill), lag(actual)),
pref_diff = pred - if_else(id == "S01", lag(actual_fill), lag(actual))) %>%
ungroup()
事实上据我所知这可以进一步简化:
df %>%
group_by(id) %>%
fill(actual) %>%
mutate(act_diff = actual - lag(actual),
pref_diff = pred - lag(actual)) %>%
ungroup()
我认为这里是等价的...
id date actual pred actual_fill act_diff pref_diff
1 M01 2021-01 3.4 3.2888898 3.4 NA NA
2 M01 2021-02 5.4 5.8194077 5.4 2.0 2.41940769
3 M01 2021-03 7.4 6.7056084 7.4 2.0 1.30560837
4 M01 2021-04 7.4 6.0544573 7.4 0.0 -1.34554271
5 M01 2021-05 7.5 5.5824091 7.5 0.1 -1.81759087
6 M01 2021-06 8.0 7.0105247 8.0 0.5 -0.48947528
7 M01 2021-07 8.9 9.7429024 8.9 0.9 1.74290243
8 M02 2021-01 10.8 10.9857140 10.8 NA NA
9 M02 2021-02 10.1 6.5220037 10.1 -0.7 -4.27799635
10 M02 2021-03 8.2 9.6889772 8.2 -1.9 -0.41102276
11 M02 2021-04 10.1 10.3980146 10.1 1.9 2.19801463
12 M02 2021-05 9.4 9.3989916 9.4 -0.7 -0.70100838
13 M02 2021-06 10.1 9.7646169 10.1 0.7 0.36461694
14 M02 2021-07 9.4 9.8550335 9.4 -0.7 -0.24496654
15 S01 2020-12 -0.3 0.4933114 -0.3 NA NA
16 S01 2021-01 NA 8.4037229 -0.3 NA 8.70372294
17 S01 2021-02 NA 8.1748545 -0.3 NA 8.47485452
18 S01 2021-03 8.6 8.5731179 8.6 8.9 8.87311785
19 S01 2021-04 NA 8.4030658 8.6 NA -0.19693420
20 S01 2021-05 NA 8.6842895 8.6 NA 0.08428946
21 S01 2021-06 8.3 8.7190792 8.3 -0.3 0.11907925
22 S01 2021-07 NA 8.2594395 8.3 NA -0.04056053
假设我有如下面板数据,它是从
df <- structure(list(id = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("M01",
"M02", "S01"), class = "factor"), date = structure(c(2L, 3L,
4L, 5L, 6L, 7L, 8L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 1L, 2L, 3L, 4L,
5L, 6L, 7L, 8L), .Label = c("2020-12", "2021-01", "2021-02",
"2021-03", "2021-04", "2021-05", "2021-06", "2021-07"), class = "factor"),
actual = c(3.4, 5.4, 7.4, 7.4, 7.5, 8, 8.9, 10.8, 10.1, 8.2,
10.1, 9.4, 10.1, 9.4, -0.3, NA, NA, 8.6, NA, NA, 8.3, NA),
pred = c(3.288889774, 5.819407687, 6.705608369, 6.054457292,
5.582409131, 7.01052472, 9.742902434, 10.98571396, 6.522003651,
9.688977242, 10.39801463, 9.398991615, 9.764616936, 9.855033457,
0.493311422, 8.403722942, 8.174854517, 8.573117852, 8.403065801,
8.684289455, 8.719079247, 8.259439468)), class = "data.frame", row.names = c(NA,
-22L))
在groupbyid
之后,对于每个月的实际值和预测值,我需要用上个月的实际值计算当月的实际值和预测值,除了:假设id=='S01'
,它是季度数据而不是月度数据,因此我需要计算当前月份与上一季度上个月实际值的差值,即 2021-03 年与 2020-12 年而不是 2021-02 年,其他月份的逻辑相同。
我的尝试代码:
df %>%
group_by(id) %>%
mutate(actual2=actual) %>%
fill(actual2) %>%
mutate(act_diff = case_when(
actual2 > lag(actual2) ~ actual2 - lag(actual2),
actual2 < lag(actual2) ~ actual2 - lag(actual2),
actual2 == lag(actual2) ~ 0),
pred_diff = case_when(
pred > lag(actual2) ~ pred - lag(actual2),
pred < lag(actual2) ~ pred - lag(actual2),
pred == lag(actual2) ~ 0),
act_diff = ifelse((id=='S01')&is.na(actual), NA, act_diff),
pred_diff = ifelse((id=='S01')&is.na(actual), NA, pred_diff), actual2=NULL) %>%
print(n=22)
结果:
id date actual pred act_diff pred_diff
<fct> <fct> <dbl> <dbl> <dbl> <dbl>
1 M01 2021-01 3.4 3.29 NA NA
2 M01 2021-02 5.4 5.82 2 2.42
3 M01 2021-03 7.4 6.71 2 1.31
4 M01 2021-04 7.4 6.05 0 -1.35
5 M01 2021-05 7.5 5.58 0.100 -1.82
6 M01 2021-06 8 7.01 0.5 -0.489
7 M01 2021-07 8.9 9.74 0.9 1.74
8 M02 2021-01 10.8 11.0 NA NA
9 M02 2021-02 10.1 6.52 -0.700 -4.28
10 M02 2021-03 8.2 9.69 -1.9 -0.411
11 M02 2021-04 10.1 10.4 1.9 2.20
12 M02 2021-05 9.4 9.40 -0.700 -0.701
13 M02 2021-06 10.1 9.76 0.700 0.365
14 M02 2021-07 9.4 9.86 -0.700 -0.245
15 S01 2020-12 -0.3 0.493 NA NA
16 S01 2021-01 NA 8.40 NA NA
17 S01 2021-02 NA 8.17 NA NA
18 S01 2021-03 8.6 8.57 8.9 8.87
19 S01 2021-04 NA 8.40 NA NA
20 S01 2021-05 NA 8.68 NA NA
21 S01 2021-06 8.3 8.72 -0.300 0.119
22 S01 2021-07 NA 8.26 NA NA
我如何修改上面的代码以获得这样的预期结果?感谢:
id date actual pred act_diff pred_diff
1 M01 2021-01 3.4 3.2888898 NA NA
2 M01 2021-02 5.4 5.8194077 2.0 2.4194077
3 M01 2021-03 7.4 6.7056084 2.0 1.3056084
4 M01 2021-04 7.4 6.0544573 0.0 -1.3455427
5 M01 2021-05 7.5 5.5824091 0.1 -1.8175909
6 M01 2021-06 8.0 7.0105247 0.5 -0.4894753
7 M01 2021-07 8.9 9.7429024 0.9 1.7429024
8 M02 2021-01 10.8 10.9857140 NA NA
9 M02 2021-02 10.1 6.5220037 -0.7 -4.2779963
10 M02 2021-03 8.2 9.6889772 -1.9 -0.4110228
11 M02 2021-04 10.1 10.3980146 1.9 2.1980146
12 M02 2021-05 9.4 9.3989916 -0.7 -0.7010084
13 M02 2021-06 10.1 9.7646169 0.7 0.3646169
14 M02 2021-07 9.4 9.8550335 -0.7 -0.2449665
15 S01 2020-12 -0.3 0.4933114 NA NA
16 S01 2021-01 NA 8.4037229 NA 8.7037229 # calculate with S01's actual value in 2020-12
17 S01 2021-02 NA 8.1748545 NA 8.4748545 # calculate with S01's actual value in 2020-12
18 S01 2021-03 8.6 8.5731179 8.9 8.8731179 # calculate with S01's actual value in 2020-12
19 S01 2021-04 NA 8.4030658 NA -0.1969342 # calculate with S01's actual value in 2021-03
20 S01 2021-05 NA 8.6842895 NA 0.0842895 # calculate with S01's actual value in 2021-03
21 S01 2021-06 8.3 8.7190792 -0.3 0.1190792 # calculate with S01's actual value in 2021-03
22 S01 2021-07 NA 8.2594395 NA -0.0405605 # calculate with S01's actual value in 2021-06
大概是这样的?
df %>%
group_by(id) %>%
mutate(actual_fill = actual) %>%
fill(actual_fill) %>%
mutate(act_diff = actual - if_else(id == "S01", lag(actual_fill), lag(actual)),
pref_diff = pred - if_else(id == "S01", lag(actual_fill), lag(actual))) %>%
ungroup()
事实上据我所知这可以进一步简化:
df %>%
group_by(id) %>%
fill(actual) %>%
mutate(act_diff = actual - lag(actual),
pref_diff = pred - lag(actual)) %>%
ungroup()
我认为这里是等价的...
id date actual pred actual_fill act_diff pref_diff
1 M01 2021-01 3.4 3.2888898 3.4 NA NA
2 M01 2021-02 5.4 5.8194077 5.4 2.0 2.41940769
3 M01 2021-03 7.4 6.7056084 7.4 2.0 1.30560837
4 M01 2021-04 7.4 6.0544573 7.4 0.0 -1.34554271
5 M01 2021-05 7.5 5.5824091 7.5 0.1 -1.81759087
6 M01 2021-06 8.0 7.0105247 8.0 0.5 -0.48947528
7 M01 2021-07 8.9 9.7429024 8.9 0.9 1.74290243
8 M02 2021-01 10.8 10.9857140 10.8 NA NA
9 M02 2021-02 10.1 6.5220037 10.1 -0.7 -4.27799635
10 M02 2021-03 8.2 9.6889772 8.2 -1.9 -0.41102276
11 M02 2021-04 10.1 10.3980146 10.1 1.9 2.19801463
12 M02 2021-05 9.4 9.3989916 9.4 -0.7 -0.70100838
13 M02 2021-06 10.1 9.7646169 10.1 0.7 0.36461694
14 M02 2021-07 9.4 9.8550335 9.4 -0.7 -0.24496654
15 S01 2020-12 -0.3 0.4933114 -0.3 NA NA
16 S01 2021-01 NA 8.4037229 -0.3 NA 8.70372294
17 S01 2021-02 NA 8.1748545 -0.3 NA 8.47485452
18 S01 2021-03 8.6 8.5731179 8.6 8.9 8.87311785
19 S01 2021-04 NA 8.4030658 8.6 NA -0.19693420
20 S01 2021-05 NA 8.6842895 8.6 NA 0.08428946
21 S01 2021-06 8.3 8.7190792 8.3 -0.3 0.11907925
22 S01 2021-07 NA 8.2594395 8.3 NA -0.04056053