按一列分组并使用R计算月度、季度混合数据当前期间值与前一个的滞后差

Groupby one column and calculate lag difference of monthly, quarterly mixed data's current period values with previous one using R

假设我有如下面板数据,它是从编辑的:

df <- structure(list(id = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("M01", 
"M02", "S01"), class = "factor"), date = structure(c(2L, 3L, 
4L, 5L, 6L, 7L, 8L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 1L, 2L, 3L, 4L, 
5L, 6L, 7L, 8L), .Label = c("2020-12", "2021-01", "2021-02", 
"2021-03", "2021-04", "2021-05", "2021-06", "2021-07"), class = "factor"), 
    actual = c(3.4, 5.4, 7.4, 7.4, 7.5, 8, 8.9, 10.8, 10.1, 8.2, 
    10.1, 9.4, 10.1, 9.4, -0.3, NA, NA, 8.6, NA, NA, 8.3, NA), 
    pred = c(3.288889774, 5.819407687, 6.705608369, 6.054457292, 
    5.582409131, 7.01052472, 9.742902434, 10.98571396, 6.522003651, 
    9.688977242, 10.39801463, 9.398991615, 9.764616936, 9.855033457, 
    0.493311422, 8.403722942, 8.174854517, 8.573117852, 8.403065801, 
    8.684289455, 8.719079247, 8.259439468)), class = "data.frame", row.names = c(NA, 
-22L)) 

在groupbyid之后,对于每个月的实际值和预测值,我需要用上个月的实际值计算当月的实际值和预测值,除了:假设id=='S01',它是季度数据而不是月度数据,因此我需要计算当前月份与上一季度上个月实际值的差值,即 2021-03 年与 2020-12 年而不是 2021-02 年,其他月份的逻辑相同。

我的尝试代码:

df %>% 
  group_by(id) %>% 
  mutate(actual2=actual) %>% 
  fill(actual2) %>% 
  mutate(act_diff = case_when(
    actual2 > lag(actual2) ~ actual2 - lag(actual2), 
    actual2 < lag(actual2) ~ actual2 - lag(actual2), 
    actual2 == lag(actual2) ~ 0), 
    pred_diff = case_when(
      pred > lag(actual2) ~ pred - lag(actual2), 
      pred < lag(actual2) ~ pred - lag(actual2), 
      pred == lag(actual2) ~ 0), 
    act_diff = ifelse((id=='S01')&is.na(actual), NA, act_diff),
    pred_diff = ifelse((id=='S01')&is.na(actual), NA, pred_diff), actual2=NULL) %>%
  print(n=22)

结果:

   id    date    actual   pred act_diff pred_diff
   <fct> <fct>    <dbl>  <dbl>    <dbl>     <dbl>
 1 M01   2021-01    3.4  3.29    NA        NA    
 2 M01   2021-02    5.4  5.82     2         2.42 
 3 M01   2021-03    7.4  6.71     2         1.31 
 4 M01   2021-04    7.4  6.05     0        -1.35 
 5 M01   2021-05    7.5  5.58     0.100    -1.82 
 6 M01   2021-06    8    7.01     0.5      -0.489
 7 M01   2021-07    8.9  9.74     0.9       1.74 
 8 M02   2021-01   10.8 11.0     NA        NA    
 9 M02   2021-02   10.1  6.52    -0.700    -4.28 
10 M02   2021-03    8.2  9.69    -1.9      -0.411
11 M02   2021-04   10.1 10.4      1.9       2.20 
12 M02   2021-05    9.4  9.40    -0.700    -0.701
13 M02   2021-06   10.1  9.76     0.700     0.365
14 M02   2021-07    9.4  9.86    -0.700    -0.245
15 S01   2020-12   -0.3  0.493   NA        NA    
16 S01   2021-01   NA    8.40    NA        NA    
17 S01   2021-02   NA    8.17    NA        NA    
18 S01   2021-03    8.6  8.57     8.9       8.87 
19 S01   2021-04   NA    8.40    NA        NA    
20 S01   2021-05   NA    8.68    NA        NA    
21 S01   2021-06    8.3  8.72    -0.300     0.119
22 S01   2021-07   NA    8.26    NA        NA 

我如何修改上面的代码以获得这样的预期结果?感谢:

    id    date actual       pred act_diff  pred_diff
1  M01 2021-01    3.4  3.2888898       NA         NA
2  M01 2021-02    5.4  5.8194077      2.0  2.4194077
3  M01 2021-03    7.4  6.7056084      2.0  1.3056084
4  M01 2021-04    7.4  6.0544573      0.0 -1.3455427
5  M01 2021-05    7.5  5.5824091      0.1 -1.8175909
6  M01 2021-06    8.0  7.0105247      0.5 -0.4894753
7  M01 2021-07    8.9  9.7429024      0.9  1.7429024
8  M02 2021-01   10.8 10.9857140       NA         NA
9  M02 2021-02   10.1  6.5220037     -0.7 -4.2779963
10 M02 2021-03    8.2  9.6889772     -1.9 -0.4110228
11 M02 2021-04   10.1 10.3980146      1.9  2.1980146
12 M02 2021-05    9.4  9.3989916     -0.7 -0.7010084
13 M02 2021-06   10.1  9.7646169      0.7  0.3646169
14 M02 2021-07    9.4  9.8550335     -0.7 -0.2449665
15 S01 2020-12   -0.3  0.4933114       NA         NA
16 S01 2021-01     NA  8.4037229       NA  8.7037229  # calculate with S01's actual value in 2020-12
17 S01 2021-02     NA  8.1748545       NA  8.4748545  # calculate with S01's actual value in 2020-12
18 S01 2021-03    8.6  8.5731179      8.9  8.8731179  # calculate with S01's actual value in 2020-12
19 S01 2021-04     NA  8.4030658       NA -0.1969342  # calculate with S01's actual value in 2021-03
20 S01 2021-05     NA  8.6842895       NA  0.0842895  # calculate with S01's actual value in 2021-03
21 S01 2021-06    8.3  8.7190792     -0.3  0.1190792  # calculate with S01's actual value in 2021-03
22 S01 2021-07     NA  8.2594395       NA -0.0405605  # calculate with S01's actual value in 2021-06

大概是这样的?

df %>%
  group_by(id) %>%
  mutate(actual_fill = actual) %>%
  fill(actual_fill) %>%
  mutate(act_diff = actual - if_else(id == "S01", lag(actual_fill), lag(actual)),
         pref_diff = pred - if_else(id == "S01", lag(actual_fill), lag(actual))) %>%
  ungroup() 

事实上据我所知这可以进一步简化:

df %>%
  group_by(id) %>%
  fill(actual) %>%
  mutate(act_diff = actual - lag(actual),
         pref_diff = pred - lag(actual)) %>%
  ungroup() 

我认为这里是等价的...

    id    date actual       pred actual_fill act_diff   pref_diff
1  M01 2021-01    3.4  3.2888898         3.4       NA          NA
2  M01 2021-02    5.4  5.8194077         5.4      2.0  2.41940769
3  M01 2021-03    7.4  6.7056084         7.4      2.0  1.30560837
4  M01 2021-04    7.4  6.0544573         7.4      0.0 -1.34554271
5  M01 2021-05    7.5  5.5824091         7.5      0.1 -1.81759087
6  M01 2021-06    8.0  7.0105247         8.0      0.5 -0.48947528
7  M01 2021-07    8.9  9.7429024         8.9      0.9  1.74290243
8  M02 2021-01   10.8 10.9857140        10.8       NA          NA
9  M02 2021-02   10.1  6.5220037        10.1     -0.7 -4.27799635
10 M02 2021-03    8.2  9.6889772         8.2     -1.9 -0.41102276
11 M02 2021-04   10.1 10.3980146        10.1      1.9  2.19801463
12 M02 2021-05    9.4  9.3989916         9.4     -0.7 -0.70100838
13 M02 2021-06   10.1  9.7646169        10.1      0.7  0.36461694
14 M02 2021-07    9.4  9.8550335         9.4     -0.7 -0.24496654
15 S01 2020-12   -0.3  0.4933114        -0.3       NA          NA
16 S01 2021-01     NA  8.4037229        -0.3       NA  8.70372294
17 S01 2021-02     NA  8.1748545        -0.3       NA  8.47485452
18 S01 2021-03    8.6  8.5731179         8.6      8.9  8.87311785
19 S01 2021-04     NA  8.4030658         8.6       NA -0.19693420
20 S01 2021-05     NA  8.6842895         8.6       NA  0.08428946
21 S01 2021-06    8.3  8.7190792         8.3     -0.3  0.11907925
22 S01 2021-07     NA  8.2594395         8.3       NA -0.04056053