根据交叉日期范围对订单进行分组

group orders based on crossing date ranges

我需要将订单分组并仅跨越他们的日期范围

场景A.

  1. 订单 1,1.3.2020-30.6.2020
  2. 订单 2,1.5.2020-31.8.2020
  3. 订单 3,31.7.2020-31.10.2020
  4. 订单 4,31.7.2020-31.12.2020

所以输出应该是

  1. 订单 1,订单 2
  2. 订单 2、订单 3、订单 4

order1,3,4 没有分组,因为它们的范围根本没有交叉

场景 B.

同上加一个订单

  1. 订单 5,1.1.2020-31.12.2020

所以输出将是

  1. 订单 1、订单 2、订单 5
  2. 订单 2、订单 3、订单 4、订单 5

我尝试使用“自助加入”来检查哪个开始日期在该范围内。 所以在订单 1 的范围内,只有订单 2 的开始日期 -> 我们有一组 然后在订单 2 的范围内,订单 3 和 4 的开始日期都落在 -> 我们有第二组 但是对于订单 3 来说,订单 4 和相反的开始日期 -> 这将给出另外 2 个组,但它们是无效的,因为订单 2 也跨越了他们的日期范围并且也应该包括在内并且因为会有 3 个重复我们应该在所需的输出中只显示一次,但这种方法会失败。

谢谢

您可以使用 MATCH_RECOGNIZE 来查找下一个值的开始日期早于或等于组中所有先前值的结束日期的组。然后您可以聚合和排除将完全包含在另一个组中的组:

WITH groups ( id, ids, start_date, end_date ) AS (
  SELECT id,
         LISTAGG( grp_id, ',' ) WITHIN GROUP ( ORDER BY start_date ),
         MIN( start_date ),
         MIN( end_date )
  FROM   (
    SELECT t.id,
           x.id AS grp_id,
           x.start_date,
           x.end_date
    FROM   table_name t
           INNER JOIN table_name x
           ON (
                   x.start_date >= t.start_date
               AND x.start_date <= t.end_date
              )
  )
  MATCH_RECOGNIZE (
    PARTITION BY id
    ORDER BY start_date
    MEASURES
      MATCH_NUMBER() AS mno
    ALL ROWS PER MATCH
    PATTERN ( FIRST_ROW GROUPED_ROWS* )
    DEFINE GROUPED_ROWS AS (
      GROUPED_ROWS.start_date <= MIN( end_date )
    )
  )
  WHERE mno = 1
  GROUP BY id
)
SELECT id,
       ids
FROM   groups g
WHERE  NOT EXISTS (
  SELECT 1
  FROM   groups x
  WHERE  g.ID <> x.ID
  AND    x.start_date <= g.start_date
  AND    g.end_date   <= x.end_date
)

示例数据:

CREATE TABLE table_name ( id, start_date, end_date ) AS
SELECT 'order 1', DATE '2020-03-01', DATE '2020-06-30' FROM DUAL UNION ALL
SELECT 'order 2', DATE '2020-05-01', DATE '2020-08-31' FROM DUAL UNION ALL
SELECT 'order 3', DATE '2020-07-31', DATE '2020-10-31' FROM DUAL UNION ALL
SELECT 'order 4', DATE '2020-07-31', DATE '2020-12-31' FROM DUAL;

输出:

ID      | IDS                    
:------ | :----------------------
order 2 | order 2,order 3,order 4
order 1 | order 1,order 2        

那我你:

INSERT INTO table_name ( id, start_date, end_date )
VALUES ( 'order 5', DATE '2020-01-01', DATE '2020-12-31' );

输出将是:

ID      | IDS                    
:------ | :----------------------
order 2 | order 2,order 3,order 4
order 5 | order 5,order 1,order 2

db<>fiddle here

MATCH_RECOGNIZE 解的结果不正确,因为顺序 5 应该在两个组中

我使用一些分析函数来解决这个问题:

-- 创建table

Create table cross_dates (order_id number, start_date date , end_date date);

-- 插入日期

insert into cross_dates values( 1, to_date('01.03.2020', 'dd.mm.yyyy'), to_date('30.06.2020', 'dd.mm.yyyy'));
insert into cross_dates values( 2, to_date('01.05.2020', 'dd.mm.yyyy'), to_date( '31.08.2020', 'dd.mm.yyyy'));
insert into cross_dates values( 3, to_date('31.07.2020', 'dd.mm.yyyy'), to_date( '31.08.2020', 'dd.mm.yyyy'));
insert into cross_dates values( 4, to_date('31.07.2020', 'dd.mm.yyyy'), to_date( '31.10.2020', 'dd.mm.yyyy'));
insert into cross_dates values( 5, to_date('01.01.2020', 'dd.mm.yyyy'), to_date( '31.12.2020', 'dd.mm.yyyy'));

-- SQL

select 'Order '|| min_order_id ||': ',  listagg( order_id, ',') within group (order by order_id)  list
from (
    select distinct min_order_id, order_id from (
        with  dates (cur_date, end_date, order_id, start_date) as (
              select start_date, end_date, order_id, start_date
              from cross_Dates
              union all
              select cur_date + 1, end_date, order_id,start_date
              from dates
              where cur_date < end_date )
    select d.order_id, 
           min(d.order_id) over(partition by greatest(d.start_date, cd.start_date)) min_order_id
    from dates d, cross_Dates cd
    where d.cur_date between cd.start_date and cd.end_date ))
group by min_order_id 
having count(*) > 1;

结果:

Order 1:    1,2,5
Order 2:    2,3,4,5

-- 添加新列并更新旧记录

alter table cross_dates add (item varchar2(1)); 

update cross_dates set item = 'A'; 

--插入新记录B

insert into cross_dates values( 1, to_date('01.01.2020', 'dd.mm.yyyy'), to_date( '30.06.2020', 'dd.mm.yyyy'), 'B');
insert into cross_dates values( 1, to_date('01.07.2020', 'dd.mm.yyyy'), to_date( '31.12.2020', 'dd.mm.yyyy'), 'B');

我的假设:

  1. A和B是分开的顺序,即使穿越也不是同组的
  2. 订单 1 B - 有两条记录作为延续 - 在我的理解中相当于一个订单:订单 1 B 01.01.2020 - 21.12.2020

如果我的假设是正确的 SQL 可能看起来像这样:

 select distinct min_order_id, order_id, item from (
           with  dates (cur_date, end_date, order_id, start_date, item) as (
              select start_date, end_date, order_id, start_date, item
              from cross_Dates             
              union all
              select cur_date + 1, end_date, order_id,start_date, item
              from dates
              where cur_date < end_date )
    select d.order_id,  d.item,
           min(d.order_id) over(partition by greatest(d.start_date, cd.start_date),d.item) min_order_id
    from dates d, cross_Dates cd
    where d.cur_date between cd.start_date and cd.end_date and d.item = cd.item )
    order by item, min_order_id; 

结果:

MIN_ORDER_IDORDER_ID我


       1          1 A
       1          2 A
       1          5 A
       2          2 A
       2          3 A
       2          4 A
       2          5 A
       5          5 A
       1          1 B

如果我的假设不正确,请告诉我在这种情况下应该是什么样的结果。

:)