检测并合并 SQL 中的日期范围连续重叠
Detect and merge date range successive overlaps in SQL
我需要检测并合并 table 中重叠的日期范围,但仅在连续的行中,不连续的重叠将被忽略。
CREATE TABLE konto (konto_nummer INTEGER, start_datum DATE, end_datum DATE);
INSERT INTO konto VALUES (1, '2020-01-01 00:00:00.000000', '2020-01-10 00:00:00.000000');
INSERT INTO konto VALUES (1, '2020-01-12 00:00:00.000000', '2020-01-20 00:00:00.000000');
INSERT INTO konto VALUES (2, '2020-01-01 00:00:00.000000', '2020-01-10 00:00:00.000000');
INSERT INTO konto VALUES (2, '2020-01-05 00:00:00.000000', '2020-01-20 00:00:00.000000');
INSERT INTO konto VALUES (2, '2020-01-15 00:00:00.000000', '2020-01-25 00:00:00.000000');
INSERT INTO konto VALUES (2, '2020-02-05 00:00:00.000000', '2020-02-20 00:00:00.000000');
INSERT INTO konto VALUES (3, '2020-01-01 00:00:00.000000', '2020-01-25 00:00:00.000000');
INSERT INTO konto VALUES (4, '2020-04-01 00:00:00.000000', '2020-04-10 00:00:00.000000');
INSERT INTO konto VALUES (4, '2020-04-05 00:00:00.000000', '2020-04-15 00:00:00.000000');
INSERT INTO konto VALUES (4, '2020-04-16 00:00:00.000000', '2020-04-25 00:00:00.000000');
INSERT INTO konto VALUES (4, '2020-04-20 00:00:00.000000', '2020-04-30 00:00:00.000000');
相同颜色的行有连续重叠。
我尝试了以下方法
SELECT
ROW_NUMBER () OVER (ORDER BY konto_nummer, start_datum, end_datum) AS RN,
konto_nummer,
start_datum,
end_datum,
MAX(end_datum) OVER (PARTITION BY konto_nummer ORDER BY start_datum, end_datum ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) AS Previousend_datum
FROM konto;
但它也结合了非连续重叠。
Gaps and Islands 有多个步骤。
首先,标记空白
with mark as (
select *,
lag(end_datum) over w
not between start_datum and end_datum as island
from konto
window w as (partition by konto_nummer
order by start_datum, end_datum)
),
然后,给岛屿编号
grps as (
select *,
sum(coalesce(island, true)::int) over w as grpnum
from mark
window w as (partition by konto_nummer
order by start_datum, end_datum)
)
然后分组汇总
select konto_nummer,
min(start_datum) as start_datum,
max(end_datum) as end_datum
from grps
group by konto_nummer, grpnum
order by 1, 2, 3;
当重叠可以是任意的时,我更喜欢使用累积最大值而不是 lag()
来找到它们。这适用于这样的情况:
A ------- B -------- B --------------C-C-------A
这是:
select konto_nummer, min(start_datum), max(end_datum)
from (select k.*,
count(*) filter (where prev_end_datum is null or prev_end_datum < start_datum) over
(partition by konto_nummer order by start_datum) as grp
from (select k.*,
max(end_datum) over (partition by konto_nummer order by start_datum range between unbounded preceding and '1 second' preceding) as prev_end_datum
from konto k
) k
) k
group by konto_nummer, grp
order by konto_nummer, min(start_datum);
Here 是一个 db<>fiddle.
我需要检测并合并 table 中重叠的日期范围,但仅在连续的行中,不连续的重叠将被忽略。
CREATE TABLE konto (konto_nummer INTEGER, start_datum DATE, end_datum DATE);
INSERT INTO konto VALUES (1, '2020-01-01 00:00:00.000000', '2020-01-10 00:00:00.000000');
INSERT INTO konto VALUES (1, '2020-01-12 00:00:00.000000', '2020-01-20 00:00:00.000000');
INSERT INTO konto VALUES (2, '2020-01-01 00:00:00.000000', '2020-01-10 00:00:00.000000');
INSERT INTO konto VALUES (2, '2020-01-05 00:00:00.000000', '2020-01-20 00:00:00.000000');
INSERT INTO konto VALUES (2, '2020-01-15 00:00:00.000000', '2020-01-25 00:00:00.000000');
INSERT INTO konto VALUES (2, '2020-02-05 00:00:00.000000', '2020-02-20 00:00:00.000000');
INSERT INTO konto VALUES (3, '2020-01-01 00:00:00.000000', '2020-01-25 00:00:00.000000');
INSERT INTO konto VALUES (4, '2020-04-01 00:00:00.000000', '2020-04-10 00:00:00.000000');
INSERT INTO konto VALUES (4, '2020-04-05 00:00:00.000000', '2020-04-15 00:00:00.000000');
INSERT INTO konto VALUES (4, '2020-04-16 00:00:00.000000', '2020-04-25 00:00:00.000000');
INSERT INTO konto VALUES (4, '2020-04-20 00:00:00.000000', '2020-04-30 00:00:00.000000');
相同颜色的行有连续重叠。
我尝试了以下方法
SELECT
ROW_NUMBER () OVER (ORDER BY konto_nummer, start_datum, end_datum) AS RN,
konto_nummer,
start_datum,
end_datum,
MAX(end_datum) OVER (PARTITION BY konto_nummer ORDER BY start_datum, end_datum ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) AS Previousend_datum
FROM konto;
但它也结合了非连续重叠。
Gaps and Islands 有多个步骤。
首先,标记空白
with mark as (
select *,
lag(end_datum) over w
not between start_datum and end_datum as island
from konto
window w as (partition by konto_nummer
order by start_datum, end_datum)
),
然后,给岛屿编号
grps as (
select *,
sum(coalesce(island, true)::int) over w as grpnum
from mark
window w as (partition by konto_nummer
order by start_datum, end_datum)
)
然后分组汇总
select konto_nummer,
min(start_datum) as start_datum,
max(end_datum) as end_datum
from grps
group by konto_nummer, grpnum
order by 1, 2, 3;
当重叠可以是任意的时,我更喜欢使用累积最大值而不是 lag()
来找到它们。这适用于这样的情况:
A ------- B -------- B --------------C-C-------A
这是:
select konto_nummer, min(start_datum), max(end_datum)
from (select k.*,
count(*) filter (where prev_end_datum is null or prev_end_datum < start_datum) over
(partition by konto_nummer order by start_datum) as grp
from (select k.*,
max(end_datum) over (partition by konto_nummer order by start_datum range between unbounded preceding and '1 second' preceding) as prev_end_datum
from konto k
) k
) k
group by konto_nummer, grp
order by konto_nummer, min(start_datum);
Here 是一个 db<>fiddle.