R递归+滞后前一个相关值

R Recursive + Lag for previous dependent value

我已经阅读了很多关于滞后或累积的内容,但仍然无法弄清楚如何对我的工作进行处理。 我已经制定了如下所示的我的简化数据以及我尝试解决此问题的方法

library(tidyverse)
library(lubridate)

currDt = ymd(20211130)

polyData = structure(list(CCDATE = structure(c(18716, 18715, 18713, 18712, 8895, 18498, 18690, 18689, 18688, 18323), class = "Date"),
                          CRDATE = structure(c(19080, 18898, 18896, 18895, 18895, 18862, 19054, 19053, 19052, 18687), class = "Date"),
                          mREGNO = c("BBB", "BBB", "BBB", "BBB", "BBB", "BBB", "DDD", "DDD", "DDD", "DDD"),
                          SCHDRNUM = c(7711858, 7710223, 7709517, 7709318, 7709172, 7487762, 7686649, 7684567, 7683420, 7374917),
                          MCCDATE = structure(c(18686, 18686, 18685, 18684, 18864, 18467, 18662, 18661, 18660, 18294), class = "Date")),
                     row.names = c(NA, -10L),
                     class = c("tbl_df", "tbl", "data.frame"))

# sort accordingly for row processing next step
# by REGNO and latest CHDRNUM
polyData = polyData %>%
  arrange(mREGNO, -SCHDRNUM)

n = polyData %>%
  group_by(mREGNO) %>%
  tally() %>%
  ungroup() %>%
  summarise(n = max(n))

n = n$n

polyData_backUp = polyData

# get carried backward CCDATE minus 1 month
# first round is temporary create columns for later use
polyData = polyData %>%
  mutate(# temporary Keep for latest tran of every mREGNO
         Keep = case_when(is.na(lag(mREGNO)) ~ 1, # latest tran or data
                          mREGNO != lag(mREGNO) ~ 1, # latest tran of every mREGNo
                          TRUE ~ 0),
         
         # temporary Carried_MCCDATE for latest and second latest tran of every mREGNO
         Carried_MCCDATE = case_when(is.na(lag(mREGNO)) ~ MCCDATE, # latest tran of data
                                     mREGNO != lag(mREGNO) ~ MCCDATE, # latest tran of every mREGNo
                                     lag(Keep) == 1 ~ lag(MCCDATE), # if latest tran is ok then only carry backwards
                                     TRUE ~ currDt)) # temporary as at date

# keep calculate base on the updated info for every rows
for (i in 2:n) {
  polyData = polyData %>%
    mutate(Keep = case_when(is.na(lag(mREGNO)) ~ 1, # latest tran or data
                            mREGNO != lag(mREGNO) ~ 1, # latest tran of every mREGNo
                            CCDATE <= Carried_MCCDATE ~ 1, # previous CCDATE must have at least one month gap
                            TRUE ~ 0),
           
           Carried_MCCDATE = case_when(is.na(lag(mREGNO)) ~ MCCDATE, # latest tran of data
                                       mREGNO != lag(mREGNO) ~ MCCDATE, # latest tran of every mREGNo
                                       lag(Keep) == 1 ~ lag(MCCDATE), # if latest tran is ok then only carry backwards
                                       TRUE ~ lag(Carried_MCCDATE))) # if latest tran is not ok then will carry from carried
}

基本上从CCDATE到MCCDATE的列都是按mREGNO和SCHDRNUM(降序)排序的原始数据; Carried_MCCDATE 中保留的列是我想要的理想结果。

标准如下,我已经在 excel 中计算出来并尝试在 R 中滞后,但它不起作用。原因是因为我需要同时处理 Carried_MCCDATE 和另一个依赖列的递归公式。

对于这么小的这个示例,我设法用一个循环来完成它,但是无论如何我可以通过使用 tidyverse 来实现吗?因为我的实际数据要大很多。

The Carried_MCCDATE formula would be:
1) If first row of different mREGNO then = MCCDATE
2) If not first row and previous row's keep = 1 then = previous row's MCCDATE
3) Else = previous row's Carried_MCCDATE

Excel formula for cell F1 =IF(C2<>C1,E2,IF(G1=1,E1,F1))

The Keep formula would be:
1) If first row of different mREGNO then = 1
2) If CCDATE <= Carried_MCCDATE then = 1
3) Else = 0

Excel formula for cell G1 =IF(C2<>C1,1,IF(A2<=F2,1,0))

请提前帮忙,非常感谢!!

假设输入是polyData_backUp我们可以使用accumulate来自purrr.

library(dplyr)
library(lubridate)
library(purrr)    
     
polyData_backUp  %>%
 group_by(mREGNO) %>%
 mutate(carried = as_date(accumulate(2:n(), .init = first(MCCDATE),
  function(carried, i) if (CCDATE[i-1] <= carried) MCCDATE[i-1] else carried))) %>%
 ungroup

给予:

# A tibble: 10 x 6
   CCDATE     CRDATE     mREGNO SCHDRNUM MCCDATE    carried   
   <date>     <date>     <chr>     <dbl> <date>     <date>    
 1 2021-03-30 2022-03-29 BBB     7711858 2021-02-28 2021-02-28
 2 2021-03-29 2021-09-28 BBB     7710223 2021-02-28 2021-02-28
 3 2021-03-27 2021-09-26 BBB     7709517 2021-02-27 2021-02-28
 4 2021-03-26 2021-09-25 BBB     7709318 2021-02-26 2021-02-28
 5 1994-05-10 2021-09-25 BBB     7709172 2021-08-25 2021-02-28
 6 2020-08-24 2021-08-23 BBB     7487762 2020-07-24 2021-08-25
 7 2021-03-04 2022-03-03 DDD     7686649 2021-02-04 2021-02-04
 8 2021-03-03 2022-03-02 DDD     7684567 2021-02-03 2021-02-04
 9 2021-03-02 2022-03-01 DDD     7683420 2021-02-02 2021-02-04
10 2020-03-02 2021-03-01 DDD     7374917 2020-02-02 2021-02-04