根据各个用户的事件时差创建会话 ID
Create the session IDs based on time-difference in events for individual users
我需要为个人用户根据他们触发的事件中的时间差创建会话 ID。如果两个事件之间的时间差超过 60 分钟,则特定用户的新会话 ID
User
Event_Time
Session Id
A
2016-01-01 00:00:15
1
A
2016-01-01 00:00:17
1
A
2016-01-01 00:00:27
1
B
2016-01-01 00:00:27
1
A
2016-01-01 04:01:59
2
B
2016-01-01 22:00:27
2
您可以使用间隙和孤岛问题的方法 - 根据与前一行的时差大小创建组(通过 lag
window 函数):
-- sample data
WITH dataset (User, Event_Time) AS (
VALUES ('A', timestamp '2016-01-01 00:00:15'),
('A', timestamp '2016-01-01 00:00:17'),
('A', timestamp '2016-01-01 00:00:27'),
('B', timestamp '2016-01-01 00:00:27'),
('A', timestamp '2016-01-01 04:01:59'),
('B', timestamp '2016-01-01 22:00:27')
)
--query
SELECT user,
event_time,
-- use cumulative sum as session id for group
1 + sum(if(date_diff('minute', lag, event_time) > 60, 1, 0)) over ( partition by user order by event_time ) as session_id
FROM (
SELECT *,
lag(event_time) over ( partition by user order by event_time ) lag -- previous row event_time
FROM dataset
)
输出:
user
event_time
session_id
A
2016-01-01 00:00:15.000
1
A
2016-01-01 00:00:17.000
1
A
2016-01-01 00:00:27.000
1
A
2016-01-01 04:01:59.000
2
B
2016-01-01 00:00:27.000
1
B
2016-01-01 22:00:27.000
2
我需要为个人用户根据他们触发的事件中的时间差创建会话 ID。如果两个事件之间的时间差超过 60 分钟,则特定用户的新会话 ID
User | Event_Time | Session Id |
---|---|---|
A | 2016-01-01 00:00:15 | 1 |
A | 2016-01-01 00:00:17 | 1 |
A | 2016-01-01 00:00:27 | 1 |
B | 2016-01-01 00:00:27 | 1 |
A | 2016-01-01 04:01:59 | 2 |
B | 2016-01-01 22:00:27 | 2 |
您可以使用间隙和孤岛问题的方法 - 根据与前一行的时差大小创建组(通过 lag
window 函数):
-- sample data
WITH dataset (User, Event_Time) AS (
VALUES ('A', timestamp '2016-01-01 00:00:15'),
('A', timestamp '2016-01-01 00:00:17'),
('A', timestamp '2016-01-01 00:00:27'),
('B', timestamp '2016-01-01 00:00:27'),
('A', timestamp '2016-01-01 04:01:59'),
('B', timestamp '2016-01-01 22:00:27')
)
--query
SELECT user,
event_time,
-- use cumulative sum as session id for group
1 + sum(if(date_diff('minute', lag, event_time) > 60, 1, 0)) over ( partition by user order by event_time ) as session_id
FROM (
SELECT *,
lag(event_time) over ( partition by user order by event_time ) lag -- previous row event_time
FROM dataset
)
输出:
user | event_time | session_id |
---|---|---|
A | 2016-01-01 00:00:15.000 | 1 |
A | 2016-01-01 00:00:17.000 | 1 |
A | 2016-01-01 00:00:27.000 | 1 |
A | 2016-01-01 04:01:59.000 | 2 |
B | 2016-01-01 00:00:27.000 | 1 |
B | 2016-01-01 22:00:27.000 | 2 |