通过随时间改变状态对数据进行分组
Grouping Data by Changing Status Over Time
我正在尝试将组号分配给数据集中随时间变化的数据的不同行组。在我的示例中,更改的字段是 tran_seq、prog_id、deg-id、cur_id 和 enroll_status。当这些字段中的任何一个与上一行不同时,我需要一个新的分组编号。当字段与前一行相同时,分组编号应保持不变。当我尝试 ROW_NUMBER()、RANK() 或 DENSE_RANK() 时,我得到同一组的增加值(例如示例中的前 2 行)。我觉得我需要 ORDER BY start_date 因为它是时间数据。
+----+----------+---------+--------+--------+---------------+------------+------------+---------+
| | tran_seq | prog_id | deg_id | cur_id | enroll_status | start_date | end_date | desired |
+----+----------+---------+--------+--------+---------------+------------+------------+---------+
| 1 | 1 | 6 | 9 | 3 | ENRL | 2004-08-22 | 2004-12-11 | 1 |
| 2 | 1 | 6 | 9 | 3 | ENRL | 2006-01-10 | 2006-05-06 | 1 |
| 3 | 1 | 6 | 9 | 59 | ENRL | 2006-08-29 | 2006-12-16 | 2 |
| 4 | 2 | 12 | 23 | 45 | ENRL | 2014-01-21 | 2014-05-16 | 3 |
| 5 | 2 | 12 | 23 | 45 | ENRL | 2014-08-18 | 2014-12-05 | 3 |
| 6 | 2 | 12 | 23 | 45 | LOAP | 2015-01-20 | 2015-05-15 | 4 |
| 7 | 2 | 12 | 23 | 45 | ENRL | 2015-08-25 | 2015-12-11 | 5 |
| 8 | 2 | 12 | 23 | 45 | LOAP | 2016-01-12 | 2016-05-06 | 6 |
| 9 | 2 | 12 | 23 | 45 | ENRL | 2016-05-16 | 2016-08-05 | 7 |
| 10 | 2 | 12 | 23 | 45 | LOAJ | 2016-08-23 | 2016-12-02 | 8 |
| 11 | 2 | 12 | 23 | 45 | ENRL | 2017-01-18 | 2017-05-05 | 9 |
| 12 | 2 | 12 | 23 | 45 | ENRL | 2018-01-17 | 2018-05-11 | 9 |
+----+----------+---------+--------+--------+---------------+------------+------------+---------+
一旦我有了分组数字,我想我可以按这些数字分组以获得我最终想要的东西:具有开始日期和结束日期的不同状态的时间轴。对于上面的示例数据,这将是:
+---+----------+---------+--------+--------+---------------+------------+------------+
| | tran_seq | prog_id | deg_id | cur_id | enroll_status | start_date | end_date |
+---+----------+---------+--------+--------+---------------+------------+------------+
| 1 | 1 | 6 | 9 | 3 | ENRL | 2004-08-22 | 2006-05-06 |
| 2 | 1 | 6 | 9 | 59 | ENRL | 2004-08-29 | 2006-12-16 |
| 3 | 2 | 12 | 23 | 45 | ENRL | 2014-01-21 | 2014-12-05 |
| 4 | 2 | 12 | 23 | 45 | LOAP | 2015-01-20 | 2015-05-15 |
| 5 | 2 | 12 | 23 | 45 | ENRL | 2015-08-25 | 2015-12-11 |
| 6 | 2 | 12 | 23 | 45 | LOAP | 2016-01-12 | 2016-05-06 |
| 7 | 2 | 12 | 23 | 45 | ENRL | 2016-05-16 | 2016-08-05 |
| 8 | 2 | 12 | 23 | 45 | LOAJ | 2016-08-23 | 2016-12-02 |
| 9 | 2 | 12 | 23 | 45 | ENRL | 2017-01-17 | 2018-05-06 |
+---+----------+---------+--------+--------+---------------+------------+------------+
这是一个经典的 XY 问题,因为您要求的是不同解决方案的中间步骤,而不是询问解决方案本身。
然而,由于您将总体最终目标作为一个附录包含在内,以下是无需中间步骤即可实现该目标的方法:
declare @t table(tran_seq int, prog_id int, deg_id int, cur_id int, enroll_status varchar(4), start_date date, end_date date, desired int)
insert into @t values
(1,6,9,3 ,'ENRL','2004-08-22','2004-12-11',1)
,(1,6,9,3 ,'ENRL','2006-01-10','2006-05-06',1)
,(1,6,9,59 ,'ENRL','2006-08-29','2006-12-16',2)
,(2,12,23,45,'ENRL','2014-01-21','2014-05-16',3)
,(2,12,23,45,'ENRL','2014-08-18','2014-12-05',3)
,(2,12,23,45,'LOAP','2015-01-20','2015-05-15',4)
,(2,12,23,45,'ENRL','2015-08-25','2015-12-11',5)
,(2,12,23,45,'LOAP','2016-01-12','2016-05-06',6)
,(2,12,23,45,'ENRL','2016-05-16','2016-08-05',7)
,(2,12,23,45,'LOAJ','2016-08-23','2016-12-02',8)
,(2,12,23,45,'ENRL','2017-01-18','2017-05-05',9)
,(2,12,23,45,'ENRL','2018-01-17','2018-05-11',9)
;
select tran_seq
,prog_id
,deg_id
,cur_id
,enroll_status
,min(start_date) as start_date
,max(end_date) as end_date
from(select *
,row_number() over (order by end_date) - row_number() over (partition by tran_seq,prog_id,deg_id,cur_id,enroll_status order by end_date) as grp
from @t
) AS g
group by tran_seq
,prog_id
,deg_id
,cur_id
,enroll_status
,grp
order by start_date;
输出
+----------+---------+--------+--------+---------------+------------+------------+
| tran_seq | prog_id | deg_id | cur_id | enroll_status | start_date | end_date |
+----------+---------+--------+--------+---------------+------------+------------+
| 1 | 6 | 9 | 3 | ENRL | 2004-08-22 | 2006-05-06 |
| 1 | 6 | 9 | 59 | ENRL | 2006-08-29 | 2006-12-16 |
| 2 | 12 | 23 | 45 | ENRL | 2014-01-21 | 2014-12-05 |
| 2 | 12 | 23 | 45 | LOAP | 2015-01-20 | 2015-05-15 |
| 2 | 12 | 23 | 45 | ENRL | 2015-08-25 | 2015-12-11 |
| 2 | 12 | 23 | 45 | LOAP | 2016-01-12 | 2016-05-06 |
| 2 | 12 | 23 | 45 | ENRL | 2016-05-16 | 2016-08-05 |
| 2 | 12 | 23 | 45 | LOAJ | 2016-08-23 | 2016-12-02 |
| 2 | 12 | 23 | 45 | ENRL | 2017-01-18 | 2018-05-11 |
+----------+---------+--------+--------+---------------+------------+------------+
我正在尝试将组号分配给数据集中随时间变化的数据的不同行组。在我的示例中,更改的字段是 tran_seq、prog_id、deg-id、cur_id 和 enroll_status。当这些字段中的任何一个与上一行不同时,我需要一个新的分组编号。当字段与前一行相同时,分组编号应保持不变。当我尝试 ROW_NUMBER()、RANK() 或 DENSE_RANK() 时,我得到同一组的增加值(例如示例中的前 2 行)。我觉得我需要 ORDER BY start_date 因为它是时间数据。
+----+----------+---------+--------+--------+---------------+------------+------------+---------+
| | tran_seq | prog_id | deg_id | cur_id | enroll_status | start_date | end_date | desired |
+----+----------+---------+--------+--------+---------------+------------+------------+---------+
| 1 | 1 | 6 | 9 | 3 | ENRL | 2004-08-22 | 2004-12-11 | 1 |
| 2 | 1 | 6 | 9 | 3 | ENRL | 2006-01-10 | 2006-05-06 | 1 |
| 3 | 1 | 6 | 9 | 59 | ENRL | 2006-08-29 | 2006-12-16 | 2 |
| 4 | 2 | 12 | 23 | 45 | ENRL | 2014-01-21 | 2014-05-16 | 3 |
| 5 | 2 | 12 | 23 | 45 | ENRL | 2014-08-18 | 2014-12-05 | 3 |
| 6 | 2 | 12 | 23 | 45 | LOAP | 2015-01-20 | 2015-05-15 | 4 |
| 7 | 2 | 12 | 23 | 45 | ENRL | 2015-08-25 | 2015-12-11 | 5 |
| 8 | 2 | 12 | 23 | 45 | LOAP | 2016-01-12 | 2016-05-06 | 6 |
| 9 | 2 | 12 | 23 | 45 | ENRL | 2016-05-16 | 2016-08-05 | 7 |
| 10 | 2 | 12 | 23 | 45 | LOAJ | 2016-08-23 | 2016-12-02 | 8 |
| 11 | 2 | 12 | 23 | 45 | ENRL | 2017-01-18 | 2017-05-05 | 9 |
| 12 | 2 | 12 | 23 | 45 | ENRL | 2018-01-17 | 2018-05-11 | 9 |
+----+----------+---------+--------+--------+---------------+------------+------------+---------+
一旦我有了分组数字,我想我可以按这些数字分组以获得我最终想要的东西:具有开始日期和结束日期的不同状态的时间轴。对于上面的示例数据,这将是:
+---+----------+---------+--------+--------+---------------+------------+------------+
| | tran_seq | prog_id | deg_id | cur_id | enroll_status | start_date | end_date |
+---+----------+---------+--------+--------+---------------+------------+------------+
| 1 | 1 | 6 | 9 | 3 | ENRL | 2004-08-22 | 2006-05-06 |
| 2 | 1 | 6 | 9 | 59 | ENRL | 2004-08-29 | 2006-12-16 |
| 3 | 2 | 12 | 23 | 45 | ENRL | 2014-01-21 | 2014-12-05 |
| 4 | 2 | 12 | 23 | 45 | LOAP | 2015-01-20 | 2015-05-15 |
| 5 | 2 | 12 | 23 | 45 | ENRL | 2015-08-25 | 2015-12-11 |
| 6 | 2 | 12 | 23 | 45 | LOAP | 2016-01-12 | 2016-05-06 |
| 7 | 2 | 12 | 23 | 45 | ENRL | 2016-05-16 | 2016-08-05 |
| 8 | 2 | 12 | 23 | 45 | LOAJ | 2016-08-23 | 2016-12-02 |
| 9 | 2 | 12 | 23 | 45 | ENRL | 2017-01-17 | 2018-05-06 |
+---+----------+---------+--------+--------+---------------+------------+------------+
这是一个经典的 XY 问题,因为您要求的是不同解决方案的中间步骤,而不是询问解决方案本身。
然而,由于您将总体最终目标作为一个附录包含在内,以下是无需中间步骤即可实现该目标的方法:
declare @t table(tran_seq int, prog_id int, deg_id int, cur_id int, enroll_status varchar(4), start_date date, end_date date, desired int)
insert into @t values
(1,6,9,3 ,'ENRL','2004-08-22','2004-12-11',1)
,(1,6,9,3 ,'ENRL','2006-01-10','2006-05-06',1)
,(1,6,9,59 ,'ENRL','2006-08-29','2006-12-16',2)
,(2,12,23,45,'ENRL','2014-01-21','2014-05-16',3)
,(2,12,23,45,'ENRL','2014-08-18','2014-12-05',3)
,(2,12,23,45,'LOAP','2015-01-20','2015-05-15',4)
,(2,12,23,45,'ENRL','2015-08-25','2015-12-11',5)
,(2,12,23,45,'LOAP','2016-01-12','2016-05-06',6)
,(2,12,23,45,'ENRL','2016-05-16','2016-08-05',7)
,(2,12,23,45,'LOAJ','2016-08-23','2016-12-02',8)
,(2,12,23,45,'ENRL','2017-01-18','2017-05-05',9)
,(2,12,23,45,'ENRL','2018-01-17','2018-05-11',9)
;
select tran_seq
,prog_id
,deg_id
,cur_id
,enroll_status
,min(start_date) as start_date
,max(end_date) as end_date
from(select *
,row_number() over (order by end_date) - row_number() over (partition by tran_seq,prog_id,deg_id,cur_id,enroll_status order by end_date) as grp
from @t
) AS g
group by tran_seq
,prog_id
,deg_id
,cur_id
,enroll_status
,grp
order by start_date;
输出
+----------+---------+--------+--------+---------------+------------+------------+
| tran_seq | prog_id | deg_id | cur_id | enroll_status | start_date | end_date |
+----------+---------+--------+--------+---------------+------------+------------+
| 1 | 6 | 9 | 3 | ENRL | 2004-08-22 | 2006-05-06 |
| 1 | 6 | 9 | 59 | ENRL | 2006-08-29 | 2006-12-16 |
| 2 | 12 | 23 | 45 | ENRL | 2014-01-21 | 2014-12-05 |
| 2 | 12 | 23 | 45 | LOAP | 2015-01-20 | 2015-05-15 |
| 2 | 12 | 23 | 45 | ENRL | 2015-08-25 | 2015-12-11 |
| 2 | 12 | 23 | 45 | LOAP | 2016-01-12 | 2016-05-06 |
| 2 | 12 | 23 | 45 | ENRL | 2016-05-16 | 2016-08-05 |
| 2 | 12 | 23 | 45 | LOAJ | 2016-08-23 | 2016-12-02 |
| 2 | 12 | 23 | 45 | ENRL | 2017-01-18 | 2018-05-11 |
+----------+---------+--------+--------+---------------+------------+------------+