条件 LEAD/LAG 无顺序保证
Conditional LEAD/LAG with no sequence guarantee
如何写一个条件 lead/lag,其中前面或后续 lead/lag 不能保证满足特定条件?就我而言,我正在查看网站流量。
示例数据(prior_path 和 prior_event 是目标字段,根据我的条件我无法到达 prior_event)
+-----------+-----+-------+---------------------------------------+-------+------------------------------------+-------------+
| sessionid | hit | type | path | event | prior_path | prior_event |
+-----------+-----+-------+---------------------------------------+-------+------------------------------------+-------------+
| 1001 | 1 | event | www.whosebug.com | hover | | |
| 1001 | 2 | page | www.whosebug.com | | | hover |
| 1001 | 3 | event | www.whosebug.com | load | | |
| 1001 | 4 | event | www.whosebug.com | blur | | load |
| 1001 | 5 | event | www.whosebug.com | click | | blur |
| 1001 | 6 | page | www.whosebug.com/post/10 | | www.whosebug.com | click |
| 1001 | 7 | event | www.whosebug.com/post/10#details | offer | | |
| 1001 | 8 | page | www.whosebug.com/post/confirm | | www.whosebug.com/post/10 | offer |
| 1001 | 9 | page | www.whosebug.com/questions/10 | | www.whosebug.com/post/confirm | offer |
| 1001 | 10 | event | www.whosebug.com/questions/10 | exit | | |
+-----------+-----+-------+---------------------------------------+-------+------------------------------------+-------------+
prior_path:最后一个路径,其中 type = page 仅适用于页面命中类型
prior_event:最后一个事件,其中类型 = 所有点击类型的事件
关于命中 8 和 9 的通知,"offer" 事件会重复出现,因为这些事件导致了这些页面。
prior_path 看起来很简单
SELECT LAG(path) OVER (PARTITION BY sessionid, type ORDER BY hit) FROM my_table
但我不确定如何获得 prior_event。
我想你只需要 lag()
和一些条件逻辑:
select . . .,
(case when type = 'page'
then lag(path) over (partition by sessionid, type order by hit)
end) as prior_path,
lag(event) over (partition by sessionid order by hit) as prior_event
from my_table;
您已经对 prior_path
有了正确的表达。您只需要将其包装在条件表达式中即可。
至于prior_event
,确实有点复杂。我建议采用以下方法:
对于事件,我们可以只使用lag()
对于页面,一个选项是使用一些间隙和孤岛技术:首先使用每次满足事件时递增的条件和来定义组,然后使用 first_value()
:
这应该可以满足您的要求:
select
t.*,
case when type = 'page'
then lag(path) over(partition by sessionid, type order by hit)
end prior_path,
case type
when 'page'
then first_value(event) over(partition by sessionid, grp order by hit)
when 'event'
then lag(event) over(partition by sessionid order by hit)
end prior_event
from (
select
t.*,
sum(case when type = 'event' then 1 else 0 end)
over(partition by sessionid order by hit) grp
from mytable t
) t
Demo on DB Fiddle(由于在野外缺少配置单元 fiddle,我使用了 Postgres - 但这也适用于配置单元):
sessionid | hit | type | path | event | grp | prior_path | prior_event
--------: | --: | :---- | :------------------------------------ | :---- | --: | :--------------------------------- | :----------
1001 | 1 | event | www.whosebug.com | hover | 1 | null | null
1001 | 2 | page | www.whosebug.com | null | 1 | null | hover
1001 | 3 | event | www.whosebug.com | load | 2 | null | null
1001 | 4 | event | www.whosebug.com | blur | 3 | null | load
1001 | 5 | event | www.whosebug.com | click | 4 | null | blur
1001 | 6 | page | www.whosebug.com/post/10 | null | 4 | www.whosebug.com | click
1001 | 7 | event | www.whosebug.com/post/10#details | offer | 5 | null | null
1001 | 8 | page | www.whosebug.com/post/confirm | null | 5 | www.whosebug.com/post/10 | offer
1001 | 9 | page | www.whosebug.com/questions/10 | null | 5 | www.whosebug.com/post/confirm | offer
1001 | 10 | event | www.whosebug.com/questions/10 | exit | 6 | null | null
如何写一个条件 lead/lag,其中前面或后续 lead/lag 不能保证满足特定条件?就我而言,我正在查看网站流量。
示例数据(prior_path 和 prior_event 是目标字段,根据我的条件我无法到达 prior_event)
+-----------+-----+-------+---------------------------------------+-------+------------------------------------+-------------+
| sessionid | hit | type | path | event | prior_path | prior_event |
+-----------+-----+-------+---------------------------------------+-------+------------------------------------+-------------+
| 1001 | 1 | event | www.whosebug.com | hover | | |
| 1001 | 2 | page | www.whosebug.com | | | hover |
| 1001 | 3 | event | www.whosebug.com | load | | |
| 1001 | 4 | event | www.whosebug.com | blur | | load |
| 1001 | 5 | event | www.whosebug.com | click | | blur |
| 1001 | 6 | page | www.whosebug.com/post/10 | | www.whosebug.com | click |
| 1001 | 7 | event | www.whosebug.com/post/10#details | offer | | |
| 1001 | 8 | page | www.whosebug.com/post/confirm | | www.whosebug.com/post/10 | offer |
| 1001 | 9 | page | www.whosebug.com/questions/10 | | www.whosebug.com/post/confirm | offer |
| 1001 | 10 | event | www.whosebug.com/questions/10 | exit | | |
+-----------+-----+-------+---------------------------------------+-------+------------------------------------+-------------+
prior_path:最后一个路径,其中 type = page 仅适用于页面命中类型 prior_event:最后一个事件,其中类型 = 所有点击类型的事件
关于命中 8 和 9 的通知,"offer" 事件会重复出现,因为这些事件导致了这些页面。
prior_path 看起来很简单
SELECT LAG(path) OVER (PARTITION BY sessionid, type ORDER BY hit) FROM my_table
但我不确定如何获得 prior_event。
我想你只需要 lag()
和一些条件逻辑:
select . . .,
(case when type = 'page'
then lag(path) over (partition by sessionid, type order by hit)
end) as prior_path,
lag(event) over (partition by sessionid order by hit) as prior_event
from my_table;
您已经对 prior_path
有了正确的表达。您只需要将其包装在条件表达式中即可。
至于prior_event
,确实有点复杂。我建议采用以下方法:
对于事件,我们可以只使用
lag()
对于页面,一个选项是使用一些间隙和孤岛技术:首先使用每次满足事件时递增的条件和来定义组,然后使用
first_value()
:
这应该可以满足您的要求:
select
t.*,
case when type = 'page'
then lag(path) over(partition by sessionid, type order by hit)
end prior_path,
case type
when 'page'
then first_value(event) over(partition by sessionid, grp order by hit)
when 'event'
then lag(event) over(partition by sessionid order by hit)
end prior_event
from (
select
t.*,
sum(case when type = 'event' then 1 else 0 end)
over(partition by sessionid order by hit) grp
from mytable t
) t
Demo on DB Fiddle(由于在野外缺少配置单元 fiddle,我使用了 Postgres - 但这也适用于配置单元):
sessionid | hit | type | path | event | grp | prior_path | prior_event --------: | --: | :---- | :------------------------------------ | :---- | --: | :--------------------------------- | :---------- 1001 | 1 | event | www.whosebug.com | hover | 1 | null | null 1001 | 2 | page | www.whosebug.com | null | 1 | null | hover 1001 | 3 | event | www.whosebug.com | load | 2 | null | null 1001 | 4 | event | www.whosebug.com | blur | 3 | null | load 1001 | 5 | event | www.whosebug.com | click | 4 | null | blur 1001 | 6 | page | www.whosebug.com/post/10 | null | 4 | www.whosebug.com | click 1001 | 7 | event | www.whosebug.com/post/10#details | offer | 5 | null | null 1001 | 8 | page | www.whosebug.com/post/confirm | null | 5 | www.whosebug.com/post/10 | offer 1001 | 9 | page | www.whosebug.com/questions/10 | null | 5 | www.whosebug.com/post/confirm | offer 1001 | 10 | event | www.whosebug.com/questions/10 | exit | 6 | null | null