比较事件是新的还是已经存在

Compare whether incident is new or already exists

有一种算法可以在网络中识别 issues/incidents。之后,它将所有案例写入数据库。让我们说它看起来像(简化):

ID Date Case
A1 2022-01-01 1
B1 2022-01-01 2
C1 2022-01-01 3
A1 2022-01-02 NA
C1 2022-01-02 NA
A1 2022-01-03 NA
B1 2022-01-03 NA
C1 2022-01-03 NA

每一行代表一个事件。
现在我想确定上次我们 运行 这个脚本时事件是否已经存在。为此,它应该检查实际日期并将其与 table.
中的最后一个现有日期进行比较 注意:可能最后一天不是昨天,最多可能相差7天

所以合乎逻辑的是:

更新 11.05.2022 - 17:02:

预期结果:

ID Date Case Comment
A1 2022-01-01 1
B1 2022-01-01 2
C1 2022-01-01 3
A1 2022-01-02 1
C1 2022-01-02 3
A1 2022-01-03 1
B1 2022-01-03 4 New case, as there wasn't B1 on 2022-01-02
C1 2022-01-03 3

我能够确定第二高的日期:

> df[, nth(unique(Date),length(unique(Date))-1), ID]
   ID         V1
1: A1 2022-01-02 ## TRUE, as it's the second highest Date
2: B1 2022-01-01 ## FALSE, as it's not the second highest Date
3: C1 2022-01-02 ## TRUE, as it's the second highest Date
> df[, nth(unique(Date),length(unique(Date))-1)]
[1] "2022-01-02" ## Second highest Date in df

但现在我正在努力创建一个具有这种情况的新专栏。有人可以帮忙吗?首选 data.table 解决方案,但 dplyr 也很棒。


MWE

library(data.table)

df = data.table(ID=c("A1", "B1", "C1", "A1", "C1", "A1", "B1", "C1"),
            Date=as.Date(c("2022-01-01","2022-01-01","2022-01-01","2022-01-02","2022-01-02","2022-01-03", "2022-01-03", "2022-01-03")),
            Case = NA)


Goal = data.table(ID=c("A1", "B1", "C1", "A1", "C1", "A1", "B1", "C1"),
                Date=as.Date(c("2022-01-01","2022-01-01","2022-01-01","2022-01-02","2022-01-02","2022-01-03", "2022-01-03", "2022-01-03")),
                Case=c(1,2,3,1,3,1,4,3))

这个怎么样:

df[order(Date), d:=c(1,diff(Date)), by = ID][
  order(d,ID),case:=rleid(ID,d)][
    ,d:=NULL]

输出:

   ID       Date case
1: A1 2022-01-01    1
2: B1 2022-01-01    2
3: C1 2022-01-01    3
4: A1 2022-01-02    1
5: C1 2022-01-02    3
6: A1 2022-01-03    1
7: B1 2022-01-03    4
8: C1 2022-01-03    3

如果你真的想要评论栏,你可以优化上面的内容,像这样:

df[order(Date), d:=c(1,diff(Date)), by = ID][
  order(d,ID),`:=`(
    case=rleid(ID,d),
    comment=fifelse(d!=1,paste0("New case, as there was no ", ID, " on ",Date-1),""))][
      ,d:=NULL][]

输出:

   ID       Date case                                    comment
1: A1 2022-01-01    1                                           
2: B1 2022-01-01    2                                           
3: C1 2022-01-01    3                                           
4: A1 2022-01-02    1                                           
5: C1 2022-01-02    3                                           
6: A1 2022-01-03    1                                           
7: B1 2022-01-03    4 New case, as there was no B1 on 2022-01-02
8: C1 2022-01-03    3