根据 ID 在日期列上滑动 window 创建新的 Event_ID

Question

假设我有一个 table 喜欢

ID	Date
1	2021-01-01
1	2021-01-05
1	2021-01-17
1	2021-02-01
1	2021-02-18
1	2021-02-28
1	2021-03-30
2	2021-01-01
2	2021-01-14
2	2021-02-15

我想 select 此 table 上的所有数据，但要创建一个包含新 Event_ID 的新列。事件定义为 15 天时间范围内具有相同 ID 的所有行。问题是我希望时间范围移动 - 如前 3 行：第 2 行在第 1 行的 15 天内（因此它们属于同一事件）。第 3 行在第 2 行的 15 天内（但距第 1 行更远），但我希望将其添加到与以前相同的事件中。（注意：table 没有像示例中那样排序，这只是为了方便）。

输出应该是

ID	Date	Event_ID
1	2021-01-01	1
1	2021-01-05	1
1	2021-01-17	1
1	2021-02-01	1
1	2021-02-18	2
1	2021-02-28	2
1	2021-03-30	3
2	2021-01-01	4
2	2021-01-14	4
2	2021-02-15	5

我也可以在 R 中使用 data.table（取决于 efficiency/performance）

Answer 1

r 解决方案可能正在使用 dplyr 方法和 data.table

中的 rleid 函数

library(dplyr)
library(data.table)
df %>% group_by(ID) %>%
  mutate(Date = as.Date(Date)) %>% #mutating Date column as Date
  arrange(ID, Date) %>% #arranging the rows in order
  mutate(Event = if_else(is.na(Date - lag(Date)), Date - Date, Date - lag(Date)),
         Event = paste(ID, cumsum(if_else(Event > 15, 1, 0)), sep = "_")) %>%
  ungroup() %>% #since the event numbers are not to be created group-wise
  mutate(Event = rleid(Event))
# A tibble: 9 x 3
     ID Date       Event
  <int> <date>     <int>
1     1 2021-01-01     1
2     1 2021-01-05     1
3     1 2021-01-17     1
4     1 2021-02-15     2
5     1 2021-02-28     2
6     1 2021-03-30     3
7     2 2021-01-01     4
8     2 2021-01-14     4
9     2 2021-02-15     5

Answer 2

这是 R 中的一种 data.table 方法：

library(data.table)
#Change to data.table
setDT(df)
#Order the dataset
setorder(df, ID, Date)
#Set flag to TRUE/FALSE if difference is greater than 15
df[, greater_than_15 := c(TRUE, diff(Date) > 15), ID]
#Take cumulative sum to create consecutive event id.
df[, Event_ID := cumsum(greater_than_15)]
df

#    ID       Date greater_than_15 Event_ID
# 1:  1 2021-01-01            TRUE        1
# 2:  1 2021-01-05           FALSE        1
# 3:  1 2021-01-17           FALSE        1
# 4:  1 2021-02-01           FALSE        1
# 5:  1 2021-02-18            TRUE        2
# 6:  1 2021-02-28           FALSE        2
# 7:  1 2021-03-30            TRUE        3
# 8:  2 2021-01-01            TRUE        4
# 9:  2 2021-01-14           FALSE        4
#10:  2 2021-02-15            TRUE        5

数据

df <- structure(list(ID = c(1, 1, 1, 1, 1, 1, 1, 2, 2, 2), 
Date = structure(c(18628, 18632, 18644, 18659, 18676, 18686, 18716, 
18628, 18641, 18673), class = "Date")), 
row.names = c(NA, -10L), class = "data.frame")

根据 ID 在日期列上滑动 window 创建新的 Event_ID

Create new Event_ID based on ID with sliding window on date column

sql

oracle

r

sliding-window