用 window 函数替换自连接
Replacing self joins by window functions
我正在处理以下示例数据;
dt | ship_id | audit_id | action
2022-01-02 | 1351 | id1 | destroy
2022-01-01 | 1351 | id1 | create
2021-12-12 | 3457 | id2 | create
2021-12-16 | 3457 | id2 | destroy
2021-12-28 | 3457 | id3 | create
对于给定的 ship_id
和 audit_id
给出一些上下文;根据 action
列的定义,必须在销毁之前创建一个条目。例如,对于ship_id=3457,和audit_id=id2;于 12 月 12 日创建并于 12 月 16 日销毁。
目标是得到,对于每个 dt
(当动作被创建时),有多少 audit_id 在它之前被创建,有多少 audit_id 被销毁它。
样本输出:
dt | created_cnt | destroyed_cnt
2022-01-01 | 2 | 1
可能的方法使用自连接想法。
select
audit_id,
ship_id,
max(case when action = 'create' then dt end) as creation_time,
max(case when action = 'destroy' then dt end) as removal_time
from table
group by 1,2)
select
t1.creation_time as creation_date,
count(t2.audit_id) as created_cnt,
count(distinct case when t2.removal_time < t1.creation_time then t2.audit_id end) as
destroyed_cnt
from cte as t1
left join cte as t2 on t1.creation_time > t2.creation_time
group by 1
order by 1 desc;
但由于 table 较大,这种自连接正在减慢速度。是否可以在这里使用某种 window 功能来代替加入?感谢帮助。
用 over(order by dt rows between unbounded preceding and 1 preceding)
检查此解决方案:
with data as (
select dt, ship, audit, action
from values('2022-01-02', 1, 'id1', 'destroy')
, ('2022-01-01', 1, 'id1', 'create')
, ('2021-12-12', 2, 'id2', 'create')
, ('2020-12-16', 2, 'id2', 'destroy')
, ('2020-12-28', 2, 'id3', 'create')
)
select dt
, sum(iff(action='create',1,0)) over(order by dt rows between unbounded preceding and 1 preceding) created_cnt
, sum(iff(action='destroy',1,0)) over(order by dt rows between unbounded preceding and 1 preceding) destroyed_cnt
from data
使用 PIVOT instead of IFF() 的备选答案。有兴趣了解哪种方法最适合您的问题。
代码(复制|粘贴|运行):
with data as (
select dt, ship, audit, action
from values('2022-01-02', 1, 'id1', 'destroy')
, ('2022-01-01', 1, 'id1', 'create')
, ('2021-12-12', 2, 'id2', 'create')
, ('2020-12-16', 2, 'id2', 'destroy')
, ('2020-12-28', 2, 'id3', 'create')
)
select
dt
, sum() over (order by dt rows between unbounded preceding and 1 preceding) created_cnt
, sum() over (order by dt rows between unbounded preceding and 1 preceding) destroyed_cnt
from
data pivot ( count (audit) for action in ('create','destroy'));
jiggle Filipe 的回答 sum(iff(action='create',1,0))
可以换成 count_if(action='create')
于是变成:
with data as (
select dt, ship, audit, action
from values('2022-01-02', 1, 'id1', 'destroy')
,('2022-01-01', 1, 'id1', 'create')
,('2021-12-12', 2, 'id2', 'create')
,('2020-12-16', 2, 'id2', 'destroy')
,('2020-12-28', 2, 'id3', 'create')
)
select dt
,count_if(action='create') over (order by dt rows between unbounded preceding and 1 preceding) created_cnt
,count_if(action='destroy') over (order by dt rows between unbounded preceding and 1 preceding) destroyed_cnt
from data
我正在处理以下示例数据;
dt | ship_id | audit_id | action
2022-01-02 | 1351 | id1 | destroy
2022-01-01 | 1351 | id1 | create
2021-12-12 | 3457 | id2 | create
2021-12-16 | 3457 | id2 | destroy
2021-12-28 | 3457 | id3 | create
对于给定的 ship_id
和 audit_id
给出一些上下文;根据 action
列的定义,必须在销毁之前创建一个条目。例如,对于ship_id=3457,和audit_id=id2;于 12 月 12 日创建并于 12 月 16 日销毁。
目标是得到,对于每个 dt
(当动作被创建时),有多少 audit_id 在它之前被创建,有多少 audit_id 被销毁它。
样本输出:
dt | created_cnt | destroyed_cnt
2022-01-01 | 2 | 1
可能的方法使用自连接想法。
select
audit_id,
ship_id,
max(case when action = 'create' then dt end) as creation_time,
max(case when action = 'destroy' then dt end) as removal_time
from table
group by 1,2)
select
t1.creation_time as creation_date,
count(t2.audit_id) as created_cnt,
count(distinct case when t2.removal_time < t1.creation_time then t2.audit_id end) as
destroyed_cnt
from cte as t1
left join cte as t2 on t1.creation_time > t2.creation_time
group by 1
order by 1 desc;
但由于 table 较大,这种自连接正在减慢速度。是否可以在这里使用某种 window 功能来代替加入?感谢帮助。
用 over(order by dt rows between unbounded preceding and 1 preceding)
检查此解决方案:
with data as (
select dt, ship, audit, action
from values('2022-01-02', 1, 'id1', 'destroy')
, ('2022-01-01', 1, 'id1', 'create')
, ('2021-12-12', 2, 'id2', 'create')
, ('2020-12-16', 2, 'id2', 'destroy')
, ('2020-12-28', 2, 'id3', 'create')
)
select dt
, sum(iff(action='create',1,0)) over(order by dt rows between unbounded preceding and 1 preceding) created_cnt
, sum(iff(action='destroy',1,0)) over(order by dt rows between unbounded preceding and 1 preceding) destroyed_cnt
from data
使用 PIVOT instead of IFF() 的备选答案。有兴趣了解哪种方法最适合您的问题。
代码(复制|粘贴|运行):
with data as (
select dt, ship, audit, action
from values('2022-01-02', 1, 'id1', 'destroy')
, ('2022-01-01', 1, 'id1', 'create')
, ('2021-12-12', 2, 'id2', 'create')
, ('2020-12-16', 2, 'id2', 'destroy')
, ('2020-12-28', 2, 'id3', 'create')
)
select
dt
, sum() over (order by dt rows between unbounded preceding and 1 preceding) created_cnt
, sum() over (order by dt rows between unbounded preceding and 1 preceding) destroyed_cnt
from
data pivot ( count (audit) for action in ('create','destroy'));
jiggle Filipe 的回答 sum(iff(action='create',1,0))
可以换成 count_if(action='create')
于是变成:
with data as (
select dt, ship, audit, action
from values('2022-01-02', 1, 'id1', 'destroy')
,('2022-01-01', 1, 'id1', 'create')
,('2021-12-12', 2, 'id2', 'create')
,('2020-12-16', 2, 'id2', 'destroy')
,('2020-12-28', 2, 'id3', 'create')
)
select dt
,count_if(action='create') over (order by dt rows between unbounded preceding and 1 preceding) created_cnt
,count_if(action='destroy') over (order by dt rows between unbounded preceding and 1 preceding) destroyed_cnt
from data