使用 mysql 重叠间隔

Overlap the intervals using mysql

    +------+------------+------------+
    | id   | start_date | end_date   |
    +------+------------+------------+
    |    1 | 2019-01-01 | 2019-01-12 |
    |    1 | 2019-01-10 | 2019-01-27 |
    |    1 | 2019-01-13 | 2019-01-15 |
    |    1 | 2019-01-18 | 2019-01-25 |
    |    1 | 2019-02-10 | 2019-02-15 |
    |    2 | 2019-01-10 | 2019-01-15 |
    +------+------------+------------+

如何合并重叠区间并在 mysql(8.X) 中得到以下结果?

    +------+------------+------------+
    | id   | start_date | end_date   |
    +------+------------+------------+
    |    1 | 2019-01-01 | 2019-01-27 |
    |    1 | 2019-02-10 | 2019-02-15 |
    |    2 | 2019-01-10 | 2019-01-15 |
    +------+------------+------------+

下面是可以在 mysql 中用来创建 table =>

的命令

    insert into interval_dates(id, start_date, end_date) values(1, '2019-01-01', '2019-01-12');
    insert into interval_dates(id, start_date, end_date) values(1, '2019-01-10', '2019-01-27');
    insert into interval_dates(id, start_date, end_date) values(1, '2019-01-13', '2019-01-15');
    insert into interval_dates(id, start_date, end_date) values(1, '2019-01-18', '2019-01-25');
    insert into interval_dates(id, start_date, end_date) values(1, '2019-02-10', '2019-02-15');
    insert into interval_dates(id, start_date, end_date) values(2, '2019-01-10', '2019-01-15');

能否请您分享任何优雅的解决方案,而无需在 mysql(8.X) 中插入中间 table?

请检查:

SELECT id, start_date, MAX(end_date) end_date
FROM ( SELECT id,
              @p_start := CASE WHEN (start_date > @p_end) OR (@p_id < id)
                               THEN start_date
                               ELSE @p_start
                               END start_date,
              @p_end := CASE WHEN (end_date > @p_end) OR (@p_id < id)
                             THEN end_date
                             ELSE @p_end
                             END end_date,
              @p_id := id
       FROM mytable, ( SELECT @p_id := MIN(id)-1, 
                              @p_start := MIN(start_date) - INTERVAL 1 DAY, 
                              @p_end := MIN(start_date) - INTERVAL 1 DAY
                       FROM mytable ) variables
       ORDER BY id, start_date, end_date ) subquery
GROUP BY id, start_date;

fiddle(甚至适用于 5.6)。

我还没有找到产生错误结果的源数据。

如果您可以使用 window 功能,为什么不使用它们呢?但是如果LAG功能不够好,我可以提供以下2种选择:

with
  a as (
    select id, start_date as ts, 1 as evt from mytable
    union all
    select id, end_date as ts, -1 from mytable
  ),
  b as (
    select *,
      sum(evt) over(partition by id order by ts, evt desc
                    rows unbounded preceding) as qnt
    from a
  ),
  c as (
    select id, ts,
      floor((row_number() over(partition by id order by ts) - 1) / 2) as grp
    from b
    where (evt = 1 and qnt = 1) or (evt = -1 and qnt = 0)
  )
select id, min(ts) as start_date, max(ts) as end_date
from c
group by id, grp;
with
  a as (
    select *,
      case
        when start_date <= max(end_date) over(
                             partition by id
                             order by start_date, end_date
                             rows between unbounded preceding and 1 preceding
                           )
        then 0
        else 1
      end as started
    from mytable
  ),
  b as (
    select *,
      sum(started) over(partition by id
                        order by start_date, end_date
                        rows unbounded preceding) as grp
    from a
  )
select id, min(start_date) as start_date, max(end_date) as end_date
from b
group by id, grp;

Demo.

我还想指出在MySql 8.

中使用Named Windows的可能性