如何在 sql 中使用窗口函数来保存记录
how do I use windowing functions in sql to persist a record
我有一个数据集,我试图根据某个事件发生(即加载)的时间戳创建一个 "session id" 在我的例子中
我的数据:
userid event timestamp
xyz load '2016-12-01 08:21:13:000'
xyz view '2016-12-01 08:21:14:000'
xyz view '2016-12-01 08:21:16:000'
xyz exit '2016-12-01 08:21:17:000'
xyz load '2016-12-02 08:01:13:000'
xyz view '2016-12-02 08:01:16:000'
abc load '2016-12-01 08:11:13:000'
abc view '2016-12-01 08:11:14:000'
我想要实现的是创建一个名为 session_start_timestamp 的新列,其中针对每个用户的最后一个 "load" 标记该行。
我知道如何通过创建一个子集 table(通过采用最小时间戳和自连接)来做到这一点,但是是否有 lag/lead/max/partition 函数可以代替它来做到这一点?
最终输出应如下所示:
userid event timestamp session_start_timestamp
xyz load '2016-12-01 08:21:13:000' '2016-12-01 08:21:13:000'
xyz view '2016-12-01 08:21:14:000' '2016-12-01 08:21:13:000'
xyz view '2016-12-01 08:21:16:000' '2016-12-01 08:21:13:000'
xyz exit '2016-12-01 08:21:17:000' '2016-12-01 08:21:13:000'
xyz load '2016-12-02 08:01:13:000' '2016-12-02 08:01:13:000'
xyz view '2016-12-02 08:01:16:000' '2016-12-02 08:01:13:000'
abc load '2016-12-01 08:11:13:000' '2016-12-01 08:11:13:000'
abc view '2016-12-01 08:11:14:000' '2016-12-01 08:11:13:000'
这是一个 gap/island 问题:
SQL DEMO (postgresql)
- 你计算差距或断点。
- 然后使用累积
SUM()
计算组
- 然后select每组
MIN()
时间
--
WITH gap as (
SELECT *, CASE WHEN "event" = 'load' THEN 1 ELSE 0 END as gap
FROM Table1
), island as (
SELECT *, SUM(gap) OVER (PARTITION BY "userid" ORDER BY "timestamp" ) as grp
FROM gap
)
SELECT *, MIN("timestamp") OVER (PARTITION BY "userid", "grp") as new_timestamp
FROM island
输出
您可以合并前两个查询:
WITH island as (
SELECT *, SUM (CASE WHEN "event" = 'load' THEN 1 ELSE 0 END )
OVER (PARTITION BY "userid" ORDER BY "timestamp" ) as grp
FROM Table1
)
SELECT *, MIN("timestamp") OVER (PARTITION BY "userid", "grp") as new_timestamp
FROM island
我有一个数据集,我试图根据某个事件发生(即加载)的时间戳创建一个 "session id" 在我的例子中
我的数据:
userid event timestamp
xyz load '2016-12-01 08:21:13:000'
xyz view '2016-12-01 08:21:14:000'
xyz view '2016-12-01 08:21:16:000'
xyz exit '2016-12-01 08:21:17:000'
xyz load '2016-12-02 08:01:13:000'
xyz view '2016-12-02 08:01:16:000'
abc load '2016-12-01 08:11:13:000'
abc view '2016-12-01 08:11:14:000'
我想要实现的是创建一个名为 session_start_timestamp 的新列,其中针对每个用户的最后一个 "load" 标记该行。
我知道如何通过创建一个子集 table(通过采用最小时间戳和自连接)来做到这一点,但是是否有 lag/lead/max/partition 函数可以代替它来做到这一点?
最终输出应如下所示:
userid event timestamp session_start_timestamp
xyz load '2016-12-01 08:21:13:000' '2016-12-01 08:21:13:000'
xyz view '2016-12-01 08:21:14:000' '2016-12-01 08:21:13:000'
xyz view '2016-12-01 08:21:16:000' '2016-12-01 08:21:13:000'
xyz exit '2016-12-01 08:21:17:000' '2016-12-01 08:21:13:000'
xyz load '2016-12-02 08:01:13:000' '2016-12-02 08:01:13:000'
xyz view '2016-12-02 08:01:16:000' '2016-12-02 08:01:13:000'
abc load '2016-12-01 08:11:13:000' '2016-12-01 08:11:13:000'
abc view '2016-12-01 08:11:14:000' '2016-12-01 08:11:13:000'
这是一个 gap/island 问题:
SQL DEMO (postgresql)
- 你计算差距或断点。
- 然后使用累积
SUM()
计算组 - 然后select每组
MIN()
时间
--
WITH gap as (
SELECT *, CASE WHEN "event" = 'load' THEN 1 ELSE 0 END as gap
FROM Table1
), island as (
SELECT *, SUM(gap) OVER (PARTITION BY "userid" ORDER BY "timestamp" ) as grp
FROM gap
)
SELECT *, MIN("timestamp") OVER (PARTITION BY "userid", "grp") as new_timestamp
FROM island
输出
您可以合并前两个查询:
WITH island as (
SELECT *, SUM (CASE WHEN "event" = 'load' THEN 1 ELSE 0 END )
OVER (PARTITION BY "userid" ORDER BY "timestamp" ) as grp
FROM Table1
)
SELECT *, MIN("timestamp") OVER (PARTITION BY "userid", "grp") as new_timestamp
FROM island