SQL - 根据日期 window 和条件映射行
SQL - map rows based on date window and condition
我问了一个非常相似的问题 ,现在正在尝试处理出现的极端情况。在最初的问题中,日期在相关案例中具有匹配的 open_date
/close_date
对。但在下面的示例中,case4 和 case3 应该是链接的,但在 case4 打开后 case3 关闭。我们能够判断这两个案例应该联系起来,因为案例 3 不能是序列中的最后一个案例,因为它具有状态 deferred
并且它的 close_date
非常接近下一个延迟案例的时间打开。我想知道在 open_date
/close_date
?
匹配案例时是否有好的方法来考虑这种情况
case_id open_date close_date user_id type status
case5 2021-06-01 2021-08-25 user1 request complete
case4 2021-05-05 2021-06-01 user1 request deferred
case3 2021-03-01 2021-05-12 user1 request deferred
case2 2020-09-15 2021-03-01 user1 request deferred
case1 2020-09-01 2020-09-15 user1 request deferred
又出现了另一种极端情况,如下所示,其中两个情况相互关联但没有匹配 open_date
/close_date
但它们是紧接着彼此打开的。
case_id open_date close_date user_id type status
case3 2022-01-20 null user1 request pending
case2 2021-10-04 2022-01-20 user1 request deferred
case1 2021-10-03 2021-12-12 user1 request deferred
该解决方案只是一个微小的调整(基于@MikhailBerlyant 提供的原始解决方案)以允许/检测连续案例的重叠。
The fiddle (with original and new problem)
注意不等式:open_date > lead(close_date)
with your_table (case_id, open_date, close_date, user_id, type) as (
select 'case5', '2021-06-01', '2021-08-25', 'user1', 'request' union all
select 'case4', '2021-05-05', '2021-06-01', 'user1', 'request' union all
select 'case3', '2021-03-01', '2021-05-12', 'user1', 'request' union all
select 'case2', '2020-09-15', '2021-03-01', 'user1', 'request' union all
select 'case1', '2020-09-01', '2020-09-15', 'user1', 'request'
)
select *,
case row_number() over(partition by user_id, type, map_id order by open_date)
when 1 then 'new case'
when count(1) over(partition by user_id, type, map_id) then 'last deferred case'
else 'deferred case'
end as status
from (
select *,
SUM(new_case) over(partition by user_id, type order by open_date) as map_id
from (
select *,
COALESCE(open_date > lead(close_date) over(partition by user_id, type order by open_date desc), 1) new_case
from your_table
) AS t1
) AS t2
ORDER BY open_date DESC
;
结果:
case_id
open_date
close_date
user_id
type
new_case
map_id
status
case5
2021-06-01
2021-08-25
user1
request
0
1
last deferred case
case4
2021-05-05
2021-06-01
user1
request
0
1
deferred case
case3
2021-03-01
2021-05-12
user1
request
0
1
deferred case
case2
2020-09-15
2021-03-01
user1
request
0
1
deferred case
case1
2020-09-01
2020-09-15
user1
request
1
1
new case
您必须对内部 FROM
语句中的 BigQuery
代码进行小幅调整:
with your_table as (
select 'case5' as case_id, '2021-06-01' as open_date, '2021-08-25' as close_date, 'user1' as user_id, 'request' as type union all
select 'case4', '2021-05-05', '2021-06-01', 'user1', 'request' union all
select 'case3', '2021-03-01', '2021-05-12', 'user1', 'request' union all
select 'case2', '2020-09-15', '2021-03-01', 'user1', 'request' union all
select 'case1', '2020-09-01', '2020-09-15', 'user1', 'request'
)
select *,
case row_number() over(partition by user_id, type, map_id order by open_date)
when 1 then 'new case'
when count(1) over(partition by user_id, type, map_id) then 'last deferred case'
else 'deferred case'
end as status
from (
select * except(new_case),
countif(new_case) over(partition by user_id, type order by open_date) as map_id
from (
select *,
case when open_date <= lead(close_date) over(partition by user_id, type order by open_date desc) then false
when open_date != lead(close_date) over(partition by user_id, type order by open_date desc) then true
else False end new_case
from your_table
)
)
我已经用旧场景和新场景以及组合场景进行了测试。这应该可以解决您的问题。我查阅了以下文档:
我问了一个非常相似的问题 open_date
/close_date
对。但在下面的示例中,case4 和 case3 应该是链接的,但在 case4 打开后 case3 关闭。我们能够判断这两个案例应该联系起来,因为案例 3 不能是序列中的最后一个案例,因为它具有状态 deferred
并且它的 close_date
非常接近下一个延迟案例的时间打开。我想知道在 open_date
/close_date
?
case_id open_date close_date user_id type status
case5 2021-06-01 2021-08-25 user1 request complete
case4 2021-05-05 2021-06-01 user1 request deferred
case3 2021-03-01 2021-05-12 user1 request deferred
case2 2020-09-15 2021-03-01 user1 request deferred
case1 2020-09-01 2020-09-15 user1 request deferred
又出现了另一种极端情况,如下所示,其中两个情况相互关联但没有匹配 open_date
/close_date
但它们是紧接着彼此打开的。
case_id open_date close_date user_id type status
case3 2022-01-20 null user1 request pending
case2 2021-10-04 2022-01-20 user1 request deferred
case1 2021-10-03 2021-12-12 user1 request deferred
该解决方案只是一个微小的调整(基于@MikhailBerlyant 提供的原始解决方案)以允许/检测连续案例的重叠。
The fiddle (with original and new problem)
注意不等式:open_date > lead(close_date)
with your_table (case_id, open_date, close_date, user_id, type) as (
select 'case5', '2021-06-01', '2021-08-25', 'user1', 'request' union all
select 'case4', '2021-05-05', '2021-06-01', 'user1', 'request' union all
select 'case3', '2021-03-01', '2021-05-12', 'user1', 'request' union all
select 'case2', '2020-09-15', '2021-03-01', 'user1', 'request' union all
select 'case1', '2020-09-01', '2020-09-15', 'user1', 'request'
)
select *,
case row_number() over(partition by user_id, type, map_id order by open_date)
when 1 then 'new case'
when count(1) over(partition by user_id, type, map_id) then 'last deferred case'
else 'deferred case'
end as status
from (
select *,
SUM(new_case) over(partition by user_id, type order by open_date) as map_id
from (
select *,
COALESCE(open_date > lead(close_date) over(partition by user_id, type order by open_date desc), 1) new_case
from your_table
) AS t1
) AS t2
ORDER BY open_date DESC
;
结果:
case_id | open_date | close_date | user_id | type | new_case | map_id | status |
---|---|---|---|---|---|---|---|
case5 | 2021-06-01 | 2021-08-25 | user1 | request | 0 | 1 | last deferred case |
case4 | 2021-05-05 | 2021-06-01 | user1 | request | 0 | 1 | deferred case |
case3 | 2021-03-01 | 2021-05-12 | user1 | request | 0 | 1 | deferred case |
case2 | 2020-09-15 | 2021-03-01 | user1 | request | 0 | 1 | deferred case |
case1 | 2020-09-01 | 2020-09-15 | user1 | request | 1 | 1 | new case |
您必须对内部 FROM
语句中的 BigQuery
代码进行小幅调整:
with your_table as (
select 'case5' as case_id, '2021-06-01' as open_date, '2021-08-25' as close_date, 'user1' as user_id, 'request' as type union all
select 'case4', '2021-05-05', '2021-06-01', 'user1', 'request' union all
select 'case3', '2021-03-01', '2021-05-12', 'user1', 'request' union all
select 'case2', '2020-09-15', '2021-03-01', 'user1', 'request' union all
select 'case1', '2020-09-01', '2020-09-15', 'user1', 'request'
)
select *,
case row_number() over(partition by user_id, type, map_id order by open_date)
when 1 then 'new case'
when count(1) over(partition by user_id, type, map_id) then 'last deferred case'
else 'deferred case'
end as status
from (
select * except(new_case),
countif(new_case) over(partition by user_id, type order by open_date) as map_id
from (
select *,
case when open_date <= lead(close_date) over(partition by user_id, type order by open_date desc) then false
when open_date != lead(close_date) over(partition by user_id, type order by open_date desc) then true
else False end new_case
from your_table
)
)
我已经用旧场景和新场景以及组合场景进行了测试。这应该可以解决您的问题。我查阅了以下文档: