SQL - 根据日期 window 和条件映射行

SQL - map rows based on date window and condition

我问了一个非常相似的问题 ,现在正在尝试处理出现的极端情况。在最初的问题中,日期在相关案例中具有匹配的 open_date/close_date 对。但在下面的示例中,case4 和 case3 应该是链接的,但在 case4 打开后 case3 关闭。我们能够判断这两个案例应该联系起来,因为案例 3 不能是序列中的最后一个案例,因为它具有状态 deferred 并且它的 close_date 非常接近下一个延迟案例的时间打开。我想知道在 open_date/close_date?

匹配案例时是否有好的方法来考虑这种情况
case_id open_date  close_date  user_id type     status      
case5   2021-06-01 2021-08-25  user1   request  complete
case4   2021-05-05 2021-06-01  user1   request  deferred
case3   2021-03-01 2021-05-12  user1   request  deferred
case2   2020-09-15 2021-03-01  user1   request  deferred
case1   2020-09-01 2020-09-15  user1   request  deferred

又出现了另一种极端情况,如下所示,其中两个情况相互关联但没有匹配 open_date/close_date 但它们是紧接着彼此打开的。

case_id open_date  close_date  user_id type     status      
case3   2022-01-20 null        user1   request  pending
case2   2021-10-04 2022-01-20  user1   request  deferred
case1   2021-10-03 2021-12-12  user1   request  deferred

该解决方案只是一个微小的调整(基于@MikhailBerlyant 提供的原始解决方案)以允许/检测连续案例的重叠。

The fiddle (with original and new problem)

注意不等式:open_date > lead(close_date)

with your_table (case_id, open_date, close_date, user_id, type) as (
  select 'case5', '2021-06-01', '2021-08-25', 'user1', 'request' union all
  select 'case4', '2021-05-05', '2021-06-01', 'user1', 'request' union all
  select 'case3', '2021-03-01', '2021-05-12', 'user1', 'request' union all
  select 'case2', '2020-09-15', '2021-03-01', 'user1', 'request' union all 
  select 'case1', '2020-09-01', '2020-09-15', 'user1', 'request' 
)
select *, 
  case row_number() over(partition by user_id, type, map_id order by open_date) 
    when 1 then 'new case'
    when count(1) over(partition by user_id, type, map_id) then 'last deferred case'
    else 'deferred case'
  end as status
from (
  select *, 
    SUM(new_case) over(partition by user_id, type order by open_date) as map_id
  from (
    select *, 
      COALESCE(open_date > lead(close_date) over(partition by user_id, type order by open_date desc), 1) new_case
    from your_table
  ) AS t1
) AS t2
ORDER BY open_date DESC
;

结果:

case_id open_date close_date user_id type new_case map_id status
case5 2021-06-01 2021-08-25 user1 request 0 1 last deferred case
case4 2021-05-05 2021-06-01 user1 request 0 1 deferred case
case3 2021-03-01 2021-05-12 user1 request 0 1 deferred case
case2 2020-09-15 2021-03-01 user1 request 0 1 deferred case
case1 2020-09-01 2020-09-15 user1 request 1 1 new case

您必须对内部 FROM 语句中的 BigQuery 代码进行小幅调整:

with your_table as ( 
  select 'case5' as case_id, '2021-06-01' as open_date, '2021-08-25' as close_date, 'user1' as user_id, 'request' as type union all
  select 'case4', '2021-05-05', '2021-06-01', 'user1', 'request' union all
  select 'case3', '2021-03-01', '2021-05-12', 'user1', 'request' union all
  select 'case2', '2020-09-15', '2021-03-01', 'user1', 'request' union all 
  select 'case1', '2020-09-01', '2020-09-15', 'user1', 'request' 
)
select *, 
  case row_number() over(partition by user_id, type, map_id order by open_date) 
    when 1 then 'new case'
    when count(1) over(partition by user_id, type, map_id) then 'last deferred case'
    else 'deferred case'
  end as status
from (
  select * except(new_case), 
    countif(new_case) over(partition by user_id, type order by open_date) as map_id 
  from (
    select *, 
      case when open_date <=  lead(close_date) over(partition by user_id, type order by open_date desc) then false
           when open_date != lead(close_date) over(partition by user_id, type order by open_date desc) then true
           else False end new_case
    from your_table
  )
)

我已经用旧场景和新场景以及组合场景进行了测试。这应该可以解决您的问题。我查阅了以下文档: