条件 LEAD/LAG 无顺序保证

Conditional LEAD/LAG with no sequence guarantee

如何写一个条件 lead/lag,其中前面或后续 lead/lag 不能保证满足特定条件?就我而言,我正在查看网站流量。

示例数据(prior_path 和 prior_event 是目标字段,根据我的条件我无法到达 prior_event)

+-----------+-----+-------+---------------------------------------+-------+------------------------------------+-------------+
| sessionid | hit | type  |                 path                  | event |             prior_path             | prior_event |
+-----------+-----+-------+---------------------------------------+-------+------------------------------------+-------------+
|      1001 |   1 | event | www.whosebug.com                 | hover |                                    |             |
|      1001 |   2 | page  | www.whosebug.com                 |       |                                    | hover       |
|      1001 |   3 | event | www.whosebug.com                 | load  |                                    |             |
|      1001 |   4 | event | www.whosebug.com                 | blur  |                                    | load        |
|      1001 |   5 | event | www.whosebug.com                 | click |                                    | blur        |
|      1001 |   6 | page  | www.whosebug.com/post/10         |       | www.whosebug.com              | click       |
|      1001 |   7 | event | www.whosebug.com/post/10#details | offer |                                    |             |
|      1001 |   8 | page  | www.whosebug.com/post/confirm    |       | www.whosebug.com/post/10      | offer       |
|      1001 |   9 | page  | www.whosebug.com/questions/10    |       | www.whosebug.com/post/confirm | offer       |
|      1001 |  10 | event | www.whosebug.com/questions/10    | exit  |                                    |             |
+-----------+-----+-------+---------------------------------------+-------+------------------------------------+-------------+

prior_path:最后一个路径,其中 type = page 仅适用于页面命中类型 prior_event:最后一个事件,其中类型 = 所有点击类型的事件

关于命中 8 和 9 的通知,"offer" 事件会重复出现,因为这些事件导致了这些页面。

prior_path 看起来很简单

SELECT LAG(path) OVER (PARTITION BY sessionid, type ORDER BY hit) FROM my_table

但我不确定如何获得 prior_event。

我想你只需要 lag() 和一些条件逻辑:

select . . .,
       (case when type = 'page'
             then lag(path) over (partition by sessionid, type order by hit)
        end) as prior_path,
       lag(event) over (partition by sessionid order by hit) as prior_event
from my_table;

您已经对 prior_path 有了正确的表达。您只需要将其包装在条件表达式中即可。

至于prior_event,确实有点复杂。我建议采用以下方法:

  • 对于事件,我们可以只使用lag()

  • 对于页面,一个选项是使用一些间隙和孤岛技术:首先使用每次满足事件时递增的条件和来定义组,然后使用 first_value() :

这应该可以满足您的要求:

select  
    t.*,
    case when type = 'page'
        then lag(path) over(partition by sessionid, type  order by hit)
    end prior_path,
    case type 
        when 'page'
            then first_value(event) over(partition by sessionid, grp order by hit)
        when 'event' 
            then lag(event) over(partition by sessionid order by hit)
        end prior_event
from (
    select 
        t.*,
        sum(case when type = 'event' then 1 else 0 end) 
            over(partition by sessionid order by hit) grp
    from mytable t
) t

Demo on DB Fiddle(由于在野外缺少配置单元 fiddle,我使用了 Postgres - 但这也适用于配置单元):

sessionid | hit | type  | path                                  | event | grp | prior_path                         | prior_event
--------: | --: | :---- | :------------------------------------ | :---- | --: | :--------------------------------- | :----------
     1001 |   1 | event | www.whosebug.com                 | hover |   1 | null                               | null       
     1001 |   2 | page  | www.whosebug.com                 | null  |   1 | null                               | hover      
     1001 |   3 | event | www.whosebug.com                 | load  |   2 | null                               | null       
     1001 |   4 | event | www.whosebug.com                 | blur  |   3 | null                               | load       
     1001 |   5 | event | www.whosebug.com                 | click |   4 | null                               | blur       
     1001 |   6 | page  | www.whosebug.com/post/10         | null  |   4 | www.whosebug.com              | click      
     1001 |   7 | event | www.whosebug.com/post/10#details | offer |   5 | null                               | null       
     1001 |   8 | page  | www.whosebug.com/post/confirm    | null  |   5 | www.whosebug.com/post/10      | offer      
     1001 |   9 | page  | www.whosebug.com/questions/10    | null  |   5 | www.whosebug.com/post/confirm | offer      
     1001 |  10 | event | www.whosebug.com/questions/10    | exit  |   6 | null                               | null