用 window 函数替换自连接

Replacing self joins by window functions

我正在处理以下示例数据;

       dt   |   ship_id     |   audit_id   | action

 2022-01-02 |     1351      |     id1      | destroy
 2022-01-01 |     1351      |     id1      | create
 2021-12-12 |     3457      |     id2      | create
 2021-12-16 |     3457      |     id2      | destroy
 2021-12-28 |     3457      |     id3      | create
 

对于给定的 ship_idaudit_id 给出一些上下文;根据 action 列的定义,必须在销毁之前创建一个条目。例如,对于ship_id=3457,和audit_id=id2;于 12 月 12 日创建并于 12 月 16 日销毁。

目标是得到,对于每个 dt(当动作被创建时),有多少 audit_id 在它之前被创建,有多少 audit_id 被销毁它。

样本输出:

       dt  |  created_cnt  |  destroyed_cnt

2022-01-01 |     2         |    1

可能的方法使用自连接想法。

 select
  audit_id,
  ship_id,
  max(case when action = 'create' then dt end) as creation_time,
  max(case when action = 'destroy' then dt end) as removal_time

from table 
group by 1,2)

select
 t1.creation_time as creation_date,
 count(t2.audit_id) as created_cnt,
 count(distinct case when  t2.removal_time <  t1.creation_time then t2.audit_id end) as 
    destroyed_cnt

from cte as t1
left join cte as t2 on  t1.creation_time > t2.creation_time
group by 1 
order by 1 desc;

但由于 table 较大,这种自连接正在减慢速度。是否可以在这里使用某种 window 功能来代替加入?感谢帮助。

over(order by dt rows between unbounded preceding and 1 preceding) 检查此解决方案:

with data as (
    select  dt,  ship,  audit,  action
    from values('2022-01-02', 1, 'id1', 'destroy')
    , ('2022-01-01', 1, 'id1', 'create')
    , ('2021-12-12', 2, 'id2', 'create')
    , ('2020-12-16', 2, 'id2', 'destroy')
    , ('2020-12-28', 2, 'id3', 'create')
)

select dt
    , sum(iff(action='create',1,0)) over(order by dt rows between unbounded preceding and 1 preceding) created_cnt
    , sum(iff(action='destroy',1,0)) over(order by dt rows between unbounded preceding and 1 preceding) destroyed_cnt
from data

使用 PIVOT instead of IFF() 的备选答案。有兴趣了解哪种方法最适合您的问题。

代码(复制|粘贴|运行):

with data as (
select  dt,  ship,  audit,  action
from values('2022-01-02', 1, 'id1', 'destroy')
, ('2022-01-01', 1, 'id1', 'create')
, ('2021-12-12', 2, 'id2', 'create')
, ('2020-12-16', 2, 'id2', 'destroy')
, ('2020-12-28', 2, 'id3', 'create')
)

select 
  dt 
, sum() over (order by dt rows between unbounded preceding and 1 preceding) created_cnt
, sum() over (order by dt rows between unbounded preceding and 1 preceding) destroyed_cnt
from 
data pivot ( count (audit) for action in ('create','destroy'));

jiggle Filipe 的回答 sum(iff(action='create',1,0)) 可以换成 count_if(action='create')

于是变成:

with data as (
    select  dt,  ship,  audit,  action
    from values('2022-01-02', 1, 'id1', 'destroy')
       ,('2022-01-01', 1, 'id1', 'create')
       ,('2021-12-12', 2, 'id2', 'create')
       ,('2020-12-16', 2, 'id2', 'destroy')
       ,('2020-12-28', 2, 'id3', 'create')
)

select dt
    ,count_if(action='create') over (order by dt rows between unbounded preceding and 1 preceding) created_cnt
    ,count_if(action='destroy') over (order by dt rows between unbounded preceding and 1 preceding) destroyed_cnt
from data