重叠生效日期聚合
Overlapping effective dates aggregation
我正在尝试汇总重叠的生效日期。日期之间的任何间隔都应视为单独的行。我正在使用 min 和 max 并且输出低于输出但希望看到预期的输出。
我的查询
WITH test_data AS (
SELECT '2020-01-01' AS date_from,
'2020-01-03' AS date_to,
'1' AS product
UNION ALL
SELECT '2020-01-05' AS date_from,
'2020-01-07' AS date_to,
'1' AS product
UNION ALL
SELECT '2020-01-06' AS date_from,
'2020-01-10' AS date_to,
'1' AS product
)
SELECT product,
MIN(date_from) AS date_from,
MAX(date_to) AS date_to
FROM test_data
GROUP BY 1;
源数据
date_from
date_to
product
2020-01-01
2020-01-03
1
2020-01-05
2020-01-07
1
2020-01-06
2020-01-10
1
输出table
date_from
date_to
product
2020-01-01
2020-01-10
1
预期输出
date_from
date_to
product
2020-01-01
2020-01-03
1
2020-01-05
2020-01-10
1
提前致谢!
这是一种间隙和孤岛问题。我推荐这样的方法:
SELECT product,
MIN(date_from) AS date_from,
MAX(date_to) AS date_to
FROM (SELECT td.*,
SUM(CASE WHEN prev_date_to >= date_from THEN 0 ELSE 1 END) OVER (PARTITION BY product ORDER BY date_to) as grp
FROM (SELECT td.*,
MAX(date_to) OVER (PARTITION BY product ORDER BY date_from ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) as prev_date_to
FROM test_data td
) td
) td
GROUP BY grp, product
ORDER BY product, MIN(date_from);
Here 是一个 db<>fiddle.
这是在做什么?最里面的子查询正在获取前几行的最新 date_to
。这用于确定每一行是否“连接”到前一行或者它是否开始一个新的分组。
中间子查询的逻辑是行开始新组时的累加和。然后外部查询按此分组聚合。
日期范围的合并可以通过 MATCH_RECOGNIZE 实现。
资料准备:
CREATE OR REPLACE TABLE test_data AS
SELECT '2020-01-01'::DATE AS date_from, '2020-01-03'::DATE AS date_to, '1' AS product
UNION ALL
SELECT '2020-01-05'::DATE AS date_from, '2020-01-07'::DATE AS date_to, '1' AS product
UNION ALL
SELECT '2020-01-06'::DATE AS date_from, '2020-01-10'::DATE AS date_to, '1' AS product;
查询:
SELECT *
FROM test_data t
MATCH_RECOGNIZE(
PARTITION BY product
ORDER BY date_from, date_to
MEASURES FIRST(date_from) date_from, MAX(date_to) date_to
PATTERN(a* b)
DEFINE a AS MAX(date_to) OVER() >= NEXT(date_from)
) mr;
相关阅读:Merging Overlapping Date Ranges with MATCH_RECOGNIZE by stewashton
我正在尝试汇总重叠的生效日期。日期之间的任何间隔都应视为单独的行。我正在使用 min 和 max 并且输出低于输出但希望看到预期的输出。
我的查询
WITH test_data AS (
SELECT '2020-01-01' AS date_from,
'2020-01-03' AS date_to,
'1' AS product
UNION ALL
SELECT '2020-01-05' AS date_from,
'2020-01-07' AS date_to,
'1' AS product
UNION ALL
SELECT '2020-01-06' AS date_from,
'2020-01-10' AS date_to,
'1' AS product
)
SELECT product,
MIN(date_from) AS date_from,
MAX(date_to) AS date_to
FROM test_data
GROUP BY 1;
源数据
date_from | date_to | product |
---|---|---|
2020-01-01 | 2020-01-03 | 1 |
2020-01-05 | 2020-01-07 | 1 |
2020-01-06 | 2020-01-10 | 1 |
输出table
date_from | date_to | product |
---|---|---|
2020-01-01 | 2020-01-10 | 1 |
预期输出
date_from | date_to | product |
---|---|---|
2020-01-01 | 2020-01-03 | 1 |
2020-01-05 | 2020-01-10 | 1 |
提前致谢!
这是一种间隙和孤岛问题。我推荐这样的方法:
SELECT product,
MIN(date_from) AS date_from,
MAX(date_to) AS date_to
FROM (SELECT td.*,
SUM(CASE WHEN prev_date_to >= date_from THEN 0 ELSE 1 END) OVER (PARTITION BY product ORDER BY date_to) as grp
FROM (SELECT td.*,
MAX(date_to) OVER (PARTITION BY product ORDER BY date_from ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) as prev_date_to
FROM test_data td
) td
) td
GROUP BY grp, product
ORDER BY product, MIN(date_from);
Here 是一个 db<>fiddle.
这是在做什么?最里面的子查询正在获取前几行的最新 date_to
。这用于确定每一行是否“连接”到前一行或者它是否开始一个新的分组。
中间子查询的逻辑是行开始新组时的累加和。然后外部查询按此分组聚合。
日期范围的合并可以通过 MATCH_RECOGNIZE 实现。
资料准备:
CREATE OR REPLACE TABLE test_data AS
SELECT '2020-01-01'::DATE AS date_from, '2020-01-03'::DATE AS date_to, '1' AS product
UNION ALL
SELECT '2020-01-05'::DATE AS date_from, '2020-01-07'::DATE AS date_to, '1' AS product
UNION ALL
SELECT '2020-01-06'::DATE AS date_from, '2020-01-10'::DATE AS date_to, '1' AS product;
查询:
SELECT *
FROM test_data t
MATCH_RECOGNIZE(
PARTITION BY product
ORDER BY date_from, date_to
MEASURES FIRST(date_from) date_from, MAX(date_to) date_to
PATTERN(a* b)
DEFINE a AS MAX(date_to) OVER() >= NEXT(date_from)
) mr;
相关阅读:Merging Overlapping Date Ranges with MATCH_RECOGNIZE by stewashton