SQL - 更多属性的间隙和孤岛问题

SQL - gap and island issue for more attributes

我有以下 table,除了其他属性外还包含:

由于正在跟踪 3 列之外的其他属性以获取历史值,因此可能会出现这样的情况:对于同一 ID,所有三列的多行具有相同的值,但时间戳不同在 [创建日期] / [更新日期]。因此,数据可能如下所示:

ID Column1 Column2 Column3 CreatedDate UpdatedDate
1122 T1 In Progress NULL 02/02/2022 18:39:38 29/03/2022 14:25:24
1122 T1 In Progress NULL 05/01/2022 10:45:50 02/02/2022 18:39:38
1122 T1 In Progress NULL 03/01/2022 12:11:47 05/01/2022 10:45:50
1122 T1 In Progress Yes 13/12/2021 21:43:44 03/01/2022 12:11:47
1122 T1 In Progress NULL 17/02/2021 14:12:15 13/12/2021 21:43:44
1122 T1 In Progress NULL 22/12/2020 14:38:32 17/02/2021 14:12:15
1122 T1 In Progress NULL 17/12/2020 18:38:38 22/12/2020 14:38:32
1122 T3 Ready NULL 30/03/2020 14:35:18 17/12/2020 18:38:38
1122 NULL Ready NULL 04/09/2019 18:33:24 30/03/2020 14:35:18
1122 T2 Ready NULL 07/01/2019 11:07:39 04/09/2019 18:33:24
1122 T2 Ready NULL 17/09/2018 14:31:17 07/01/2019 11:07:39
1122 T0 Ready NULL 28/08/2018 14:31:39 17/09/2018 14:31:17
1122 T0 Ready NULL 13/02/2018 14:48:44 28/08/2018 14:31:39

我想以正确的顺序保留所有 3 列的唯一值,因此理想的输出应该如下所示:

ID Column1 Column2 Column3 CreatedDate UpdatedDate
1122 T1 In Progress NULL 03/01/2022 12:11:47 29/03/2022 14:25:24
1122 T1 In Progress Yes 13/12/2021 21:43:44 03/01/2022 12:11:47
1122 T1 In Progress NULL 17/12/2020 18:38:38 13/12/2021 21:43:44
1122 T3 Ready NULL 30/03/2020 14:35:18 17/12/2020 18:38:38
1122 NULL Ready NULL 04/09/2019 18:33:24 30/03/2020 14:35:18
1122 T2 Ready NULL 17/09/2018 14:31:17 04/09/2019 18:33:24
1122 T0 Ready NULL 13/02/2018 14:48:44 17/09/2018 14:31:17

如果只有一列,下面的代码工作正常,但它不适用于多列,因为它 returns 所有唯一行。

select ID, Column1, Column2, Column3,  min(createddate), max(updateddate)
from (select t.*,
             sum(case when prev_updatedate >= createddate then 0 else 1 end) over (partition by ID order by createddate) as grp
      from (select h.*,
                   max(updateddate) over (partition by ID order by createddate rows between unbounded preceding and 1 preceding) as prev_updatedate
            from #history h
           ) h
     ) h
group by ID, Column1, Column2, Column3, grp;

请问有什么解决办法吗?

你可以尝试使用ROW_NUMBER window函数来弥补你的逻辑差距然后你可能会得到gaps-and-islands

的分组
SELECT ID,Column1,Column2,Column3, min(createddate) CreatedDate, max(updateddate) UpdatedDate
FROM ( 
  select *,
      ROW_NUMBER() over (partition by ID order by createddate) - 
      ROW_NUMBER() over (partition by ID,Column1,Column2,Column3 order by createddate) grp
  from history
) t1
GROUP BY grp,ID,Column1,Column2,Column3
ORDER BY CreatedDate DESC

sqlfiddle