如果两个连续事件的时间戳差异超过 30 分钟,则填充会话 ID 并生成新的会话 ID

populate the session id and generate new session id if timestamp difference two consecutive event is more then 30 min

输入 - 从现有配置单元或红移中读取 table

user   |    Timestamp    |  SessionId
---------------------------------------
u1     |    10:00AM      |      ?    
u1     |    10:05AM      |      ?    
u1     |    10:10AM      |      ?    
u1     |    10:15AM      |      ?    
u1     |    11:40AM      |      ?    
u1     |    11:50AM      |      ?    
u1     |    12:15PM      |      ?

预期输出

user   |    Timestamp    |  SessionId
---------------------------------------
u1     |    10:00AM      |      s1    
u1     |    10:05AM      |      s1    
u1     |    10:10AM      |      s1    
u1     |    10:15AM      |      s1    
u1     |    11:40AM      |      s2    
u1     |    11:50AM      |      s2    
u1     |    12:15PM      |      s3

我们如何解决此问题以使用配置单元或 redshift 更新现有 table?

将时间戳转换为 unix_timestamp(秒),使用 lag() 函数获取之前的时间戳,计算差异并在超过 30 分钟后分配 new_session=1,计算 运行 new_session 的总和以获得会话 ID。

with mydata as (
select 'u1' as `user`,'10:00AM' `timestamp` union all    
select 'u1','10:05AM' union all    
select 'u1','10:10AM' union all    
select 'u1','10:15AM' union all    
select 'u1','11:40AM' union all    
select 'u1','11:50AM' union all    
select 'u1','12:15AM' -----------15 min after midnight
)

select `user`, `timestamp`, 
       concat('s',sum(new_session) over(partition by `user` order by `timestamp`)) as session_id
from 
(
select --calculate new_session flag based on differennce between ts and prev_ts
      `user`, `timestamp`, ts, prev_ts,
      case when ((ts-prev_ts)/60 > 30) or prev_ts is NULL then 1 end as new_session
from      
(
select `user`, `timestamp`, ts, 
      --calculate previous time
       lag(ts) over(partition by `user` order by ts) prev_ts
from
(
--convert time to seconds
select `user`, `timestamp`, unix_timestamp(`timestamp`,'hh:mma') as ts from mydata
)s --ts conversion
)s --prev_ts
)s --new_session

结果:

user    timestamp   session_id
u1      10:00AM      s1
u1      10:05AM      s1
u1      10:10AM      s1
u1      10:15AM      s1
u1      11:40AM      s2
u1      11:50AM      s2
u1      12:15AM      s3

请注意,我将 12:15PM 更改为 12:15AM 以获得 s3 会话,因为在您的数据示例中 12:15PM 是中午后 15 分钟,11:50AM 和12:15PM 为 25 分钟,不会触发新会话。要像您的问题一样获得 S3 会话,它应该是 12:15AM。 12:15AM 是午夜后 15 分钟,参见 12-hour_clock wiki