按用户合并期间
Combine periods by user
我想按用户重叠的时间间隔将时间段(开始 -> 结束日期)合并为一个时间间隔。
+---------+-------------+------------+------------+
| USER_ID | CONTRACT_ID | START_DATE | END_DATE |
+---------+-------------+------------+------------+
| 1 | 14 | 18.02.2021 | 18.04.2022 |
| 1 | 13 | 01.01.2019 | 01.01.2020 |
| 1 | 12 | 01.01.2018 | 01.01.2019 |
| 1 | 11 | 13.02.2017 | 13.02.2019 |
| 2 | 23 | 19.06.2021 | 18.04.2022 |
| 2 | 22 | 01.07.2019 | 01.07.2020 |
| 2 | 21 | 19.01.2019 | 19.01.2020 |
+---------+-------------+------------+------------+
因此我想要 table 这样:
+---------+------------+------------+
| USER_ID | START_DATE | END_DATE |
+---------+------------+------------+
| 1 | 18.02.2021 | 18.04.2022 |
| 1 | 13.02.2017 | 01.01.2020 |
| 2 | 19.06.2021 | 18.04.2022 |
| 2 | 19.01.2019 | 01.07.2020 |
+---------+------------+------------+
我尝试了不同的选项,但似乎没有任何效果。
WITH DATAD (USER_ID,CONTRACT_ID,START_DATE,END_DATE)AS
(
SELECT 1,14,'2021-02-18','2022-04-18' UNION ALL
SELECT 1,13,'2019-01-01','2020-01-01' UNION ALL
SELECT 1,12,'2018-01-01','2019-01-01' UNION ALL
SELECT 1,11,'2017-02-13','2019-02-13' UNION ALL
SELECT 2,23,'2021-06-19','2022-04-18' UNION ALL
SELECT 2,22,'2019-07-01','2020-07-01' UNION ALL
SELECT 2,21,'2019-01-19','2020-01-19'
),
CTE AS
(
SELECT USER_ID,START_DATE AS TS ,1 AS TYPE,
ROW_NUMBER()OVER(PARTITION BY USER_ID ORDER BY START_DATE) AS S,NULL AS E
FROM DATAD
UNION ALL
SELECT USER_ID,END_DATE AS TS,-1 AS TYPE,
NULL,ROW_NUMBER()OVER(PARTITION BY USER_ID ORDER BY END_DATE)AS E
FROM DATAD
),
C2 AS
(
SELECT *,ROW_NUMBER()OVER(PARTITION BY USER_ID ORDER BY TS,TYPE DESC)AS SE
FROM CTE
),
C3 AS
(
SELECT *,
FLOOR((ROW_NUMBER()OVER(PARTITION BY USER_ID ORDER BY TS)+1)/2)AS P
FROM C2
CROSS APPLY (VALUES(S-(SE-S)-1,(SE-E)-E)) AS A(CS,CE)
WHERE CS=0 OR CE=0
)
SELECT USER_ID,MIN(TS)AS START_TIME,MAX(TS)AS END_TIME
FROM C3
GROUP BY USER_ID,P
ORDER BY USER_ID;
这是 Itzik Ben-Gan 的“T-SQL 查询”一书中的“copy-paste”。关于“打包间隔”的段落
好的,所以我使用以下逻辑得到答案-
- 将范围的结束日期增加到重叠的较大范围
- 完全消除另一个范围内的所有范围。
所以我使用 cte 来绘制第一步,并使用 WHERE NOT EXISTS 绘制第二步,就像这样
; with extendedset as
(
select
A.user_id,
A.contract_id,
A.start_date,
max(case when B.user_id is null then A.end_date else B.end_date end) end_date
from t A left join t B
on
A.user_id=B.user_id
and A.contract_id<>B.contract_id
and A.end_date>=B.start_date
and A.end_date <=B.end_date
group by A.user_id,
A.contract_id,
A.start_date
)
select
user_id,start_date, end_date
from extendedset t1
where not exists
( select 1 from extendedset t2
where t1.user_id=t2.user_id and t1.contract_id<>t2.contract_id
and t1.start_date >= t2.start_date and t1.end_date <=t2.end_date)
看工作dbfiddle
我想按用户重叠的时间间隔将时间段(开始 -> 结束日期)合并为一个时间间隔。
+---------+-------------+------------+------------+
| USER_ID | CONTRACT_ID | START_DATE | END_DATE |
+---------+-------------+------------+------------+
| 1 | 14 | 18.02.2021 | 18.04.2022 |
| 1 | 13 | 01.01.2019 | 01.01.2020 |
| 1 | 12 | 01.01.2018 | 01.01.2019 |
| 1 | 11 | 13.02.2017 | 13.02.2019 |
| 2 | 23 | 19.06.2021 | 18.04.2022 |
| 2 | 22 | 01.07.2019 | 01.07.2020 |
| 2 | 21 | 19.01.2019 | 19.01.2020 |
+---------+-------------+------------+------------+
因此我想要 table 这样:
+---------+------------+------------+
| USER_ID | START_DATE | END_DATE |
+---------+------------+------------+
| 1 | 18.02.2021 | 18.04.2022 |
| 1 | 13.02.2017 | 01.01.2020 |
| 2 | 19.06.2021 | 18.04.2022 |
| 2 | 19.01.2019 | 01.07.2020 |
+---------+------------+------------+
我尝试了不同的选项,但似乎没有任何效果。
WITH DATAD (USER_ID,CONTRACT_ID,START_DATE,END_DATE)AS
(
SELECT 1,14,'2021-02-18','2022-04-18' UNION ALL
SELECT 1,13,'2019-01-01','2020-01-01' UNION ALL
SELECT 1,12,'2018-01-01','2019-01-01' UNION ALL
SELECT 1,11,'2017-02-13','2019-02-13' UNION ALL
SELECT 2,23,'2021-06-19','2022-04-18' UNION ALL
SELECT 2,22,'2019-07-01','2020-07-01' UNION ALL
SELECT 2,21,'2019-01-19','2020-01-19'
),
CTE AS
(
SELECT USER_ID,START_DATE AS TS ,1 AS TYPE,
ROW_NUMBER()OVER(PARTITION BY USER_ID ORDER BY START_DATE) AS S,NULL AS E
FROM DATAD
UNION ALL
SELECT USER_ID,END_DATE AS TS,-1 AS TYPE,
NULL,ROW_NUMBER()OVER(PARTITION BY USER_ID ORDER BY END_DATE)AS E
FROM DATAD
),
C2 AS
(
SELECT *,ROW_NUMBER()OVER(PARTITION BY USER_ID ORDER BY TS,TYPE DESC)AS SE
FROM CTE
),
C3 AS
(
SELECT *,
FLOOR((ROW_NUMBER()OVER(PARTITION BY USER_ID ORDER BY TS)+1)/2)AS P
FROM C2
CROSS APPLY (VALUES(S-(SE-S)-1,(SE-E)-E)) AS A(CS,CE)
WHERE CS=0 OR CE=0
)
SELECT USER_ID,MIN(TS)AS START_TIME,MAX(TS)AS END_TIME
FROM C3
GROUP BY USER_ID,P
ORDER BY USER_ID;
这是 Itzik Ben-Gan 的“T-SQL 查询”一书中的“copy-paste”。关于“打包间隔”的段落
好的,所以我使用以下逻辑得到答案-
- 将范围的结束日期增加到重叠的较大范围
- 完全消除另一个范围内的所有范围。
所以我使用 cte 来绘制第一步,并使用 WHERE NOT EXISTS 绘制第二步,就像这样
; with extendedset as
(
select
A.user_id,
A.contract_id,
A.start_date,
max(case when B.user_id is null then A.end_date else B.end_date end) end_date
from t A left join t B
on
A.user_id=B.user_id
and A.contract_id<>B.contract_id
and A.end_date>=B.start_date
and A.end_date <=B.end_date
group by A.user_id,
A.contract_id,
A.start_date
)
select
user_id,start_date, end_date
from extendedset t1
where not exists
( select 1 from extendedset t2
where t1.user_id=t2.user_id and t1.contract_id<>t2.contract_id
and t1.start_date >= t2.start_date and t1.end_date <=t2.end_date)
看工作dbfiddle