基于匹配上一行结束日期的开始日期分组 SQL
Grouping based on start date matching the previous row's end date SQL
希望有人能帮我解决这个问题。
我有以下示例数据集:
MEM_ID
CLM_ID
ADM_DT
DCHG_DT
1
111
01-01-2020
02-01-2020
1
112
03-01-2020
04-01-2020
1
113
04-01-2020
05-01-2020
1
114
06-01-2020
07-01-2020
2
211
01-01-2020
02-01-2020
2
212
05-01-2020
08-01-2020
3
311
02-01-2020
03-01-2020
3
312
03-01-2020
05-01-2020
3
313
05-01-2020
06-01-2020
3
314
07-01-2020
08-01-2020
我正在尝试根据 MEM_ID 创建分组。如果一个 ADM_DT 等于前一个 DCHG_DT 那么这些记录应该组合在一起
下面是预期的输出:
MEM_ID
CLM_ID
ADM_DT
DCHG_DT
GROUP_ID
1
111
01-01-2020
02-01-2020
1
1
112
03-01-2020
04-01-2020
2
1
113
04-01-2020
05-01-2020
2
1
114
06-01-2020
07-01-2020
3
2
211
01-01-2020
02-01-2020
1
2
212
05-01-2020
08-01-2020
2
3
311
02-01-2020
03-01-2020
1
3
312
03-01-2020
05-01-2020
1
3
313
05-01-2020
06-01-2020
1
3
314
07-01-2020
08-01-2020
2
我尝试了以下操作:
select DISTINCT MEM_ID
,CLM_ID
,ADM_DT
,DCHG_DT
,CASE WHEN ADM_DT = LAG(DCHG_DT) OVER(PARTITION BY MEM_ID ORDER BY ADM_DT, DCHG_DT) THEN 0 ELSE 1 END AS ISSTART
FROM
table
产生这样的东西:
MEM_ID
CLM_ID
ADM_DT
DCHG_DT
ISSTART
1
111
01-01-2020
02-01-2020
1
1
112
03-01-2020
04-01-2020
1
1
113
04-01-2020
05-01-2020
0
1
114
06-01-2020
07-01-2020
1
2
211
01-01-2020
02-01-2020
1
2
212
05-01-2020
08-01-2020
1
3
311
02-01-2020
03-01-2020
1
3
312
03-01-2020
05-01-2020
0
3
313
05-01-2020
06-01-2020
0
3
314
07-01-2020
08-01-2020
1
我还研究了其他外部资源,例如 https://www.kodyaz.com/t-sql/sql-query-for-overlapping-time-periods-on-sql-server.aspx
这让我非常接近,但我意识到作者使用的是递归 CTE,而 Netezza 不支持该功能。
最终我想创建这些分组,这样我就可以合并到我正在使用的原始 table 并根据每个 MEM_ID.[=16 的分配组求和值=]
提前感谢您提供的任何帮助。
试试这个:
select MEM_ID, CLM_ID, ADM_DT, DCHG_DT,
sum(ISSTART) over(partition by MEM_ID order by ADM_DT, DCHG_DT rows unbounded preceding) as GROUP_ID from
(select MEM_ID
,CLM_ID
,ADM_DT
,DCHG_DT
,CASE WHEN ADM_DT = LAG(DCHG_DT) OVER(PARTITION BY MEM_ID ORDER BY ADM_DT, DCHG_DT) THEN 0 ELSE 1 END AS ISSTART
FROM
table_name) t
基本上在 sum
中使用您的 ISSTART
来获得所需的输出。
希望有人能帮我解决这个问题。
我有以下示例数据集:
MEM_ID | CLM_ID | ADM_DT | DCHG_DT |
---|---|---|---|
1 | 111 | 01-01-2020 | 02-01-2020 |
1 | 112 | 03-01-2020 | 04-01-2020 |
1 | 113 | 04-01-2020 | 05-01-2020 |
1 | 114 | 06-01-2020 | 07-01-2020 |
2 | 211 | 01-01-2020 | 02-01-2020 |
2 | 212 | 05-01-2020 | 08-01-2020 |
3 | 311 | 02-01-2020 | 03-01-2020 |
3 | 312 | 03-01-2020 | 05-01-2020 |
3 | 313 | 05-01-2020 | 06-01-2020 |
3 | 314 | 07-01-2020 | 08-01-2020 |
我正在尝试根据 MEM_ID 创建分组。如果一个 ADM_DT 等于前一个 DCHG_DT 那么这些记录应该组合在一起
下面是预期的输出:
MEM_ID | CLM_ID | ADM_DT | DCHG_DT | GROUP_ID |
---|---|---|---|---|
1 | 111 | 01-01-2020 | 02-01-2020 | 1 |
1 | 112 | 03-01-2020 | 04-01-2020 | 2 |
1 | 113 | 04-01-2020 | 05-01-2020 | 2 |
1 | 114 | 06-01-2020 | 07-01-2020 | 3 |
2 | 211 | 01-01-2020 | 02-01-2020 | 1 |
2 | 212 | 05-01-2020 | 08-01-2020 | 2 |
3 | 311 | 02-01-2020 | 03-01-2020 | 1 |
3 | 312 | 03-01-2020 | 05-01-2020 | 1 |
3 | 313 | 05-01-2020 | 06-01-2020 | 1 |
3 | 314 | 07-01-2020 | 08-01-2020 | 2 |
我尝试了以下操作:
select DISTINCT MEM_ID
,CLM_ID
,ADM_DT
,DCHG_DT
,CASE WHEN ADM_DT = LAG(DCHG_DT) OVER(PARTITION BY MEM_ID ORDER BY ADM_DT, DCHG_DT) THEN 0 ELSE 1 END AS ISSTART
FROM
table
产生这样的东西:
MEM_ID | CLM_ID | ADM_DT | DCHG_DT | ISSTART |
---|---|---|---|---|
1 | 111 | 01-01-2020 | 02-01-2020 | 1 |
1 | 112 | 03-01-2020 | 04-01-2020 | 1 |
1 | 113 | 04-01-2020 | 05-01-2020 | 0 |
1 | 114 | 06-01-2020 | 07-01-2020 | 1 |
2 | 211 | 01-01-2020 | 02-01-2020 | 1 |
2 | 212 | 05-01-2020 | 08-01-2020 | 1 |
3 | 311 | 02-01-2020 | 03-01-2020 | 1 |
3 | 312 | 03-01-2020 | 05-01-2020 | 0 |
3 | 313 | 05-01-2020 | 06-01-2020 | 0 |
3 | 314 | 07-01-2020 | 08-01-2020 | 1 |
我还研究了其他外部资源,例如 https://www.kodyaz.com/t-sql/sql-query-for-overlapping-time-periods-on-sql-server.aspx
这让我非常接近,但我意识到作者使用的是递归 CTE,而 Netezza 不支持该功能。
最终我想创建这些分组,这样我就可以合并到我正在使用的原始 table 并根据每个 MEM_ID.[=16 的分配组求和值=]
提前感谢您提供的任何帮助。
试试这个:
select MEM_ID, CLM_ID, ADM_DT, DCHG_DT,
sum(ISSTART) over(partition by MEM_ID order by ADM_DT, DCHG_DT rows unbounded preceding) as GROUP_ID from
(select MEM_ID
,CLM_ID
,ADM_DT
,DCHG_DT
,CASE WHEN ADM_DT = LAG(DCHG_DT) OVER(PARTITION BY MEM_ID ORDER BY ADM_DT, DCHG_DT) THEN 0 ELSE 1 END AS ISSTART
FROM
table_name) t
基本上在 sum
中使用您的 ISSTART
来获得所需的输出。