将一个维度实体的历史时期合并为一个

Merge historical periods of an dimension entity into one

我有一个缓慢变化的维度类型 2,其中的行是相同的(除了开始日期和结束日期之外)。如何编写漂亮的 SQL 查询来合并相同且具有连接时间段的行?

当前数据

+-------------+---------------------+--------------+------------+
| DimensionID | DimensionAttribute  | RowStartDate | RowEndDate |
+-------------+---------------------+--------------+------------+
|           1 | SomeValue           | 2019-01-01   | 2019-01-31 |
|           1 | SomeValue           | 2019-02-01   | 2019-02-28 |
|           1 | AnotherValue        | 2019-03-01   | 2019-03-31 |
|           1 | SomeValue           | 2019-04-01   | 2019-04-30 |
|           1 | SomeValue           | 2019-05-01   | 2019-05-31 |
|           2 | SomethingElse       | 2019-01-01   | 2019-01-31 |
|           2 | SomethingElse       | 2019-02-01   | 2019-02-28 |
|           2 | SomethingElse       | 2019-03-01   | 2019-03-31 |
|           2 | CompletelyDifferent | 2019-04-01   | 2019-04-30 |
|           2 | SomethingElse       | 2019-05-01   | 2019-05-31 |
+-------------+---------------------+--------------+------------+

结果

+-------------+---------------------+--------------+------------+
| DimensionID | DimensionAttribute  | RowStartDate | RowEndDate |
+-------------+---------------------+--------------+------------+
|           1 | SomeValue           | 2019-01-01   | 2019-02-28 |
|           1 | AnotherValue        | 2019-03-01   | 2019-03-31 |
|           1 | SomeValue           | 2019-04-01   | 2019-05-31 |
|           2 | SomethingElse       | 2019-01-01   | 2019-03-31 |
|           2 | CompletelyDifferent | 2019-04-01   | 2019-04-30 |
|           2 | SomethingElse       | 2019-05-01   | 2019-05-31 |
+-------------+---------------------+--------------+------------+

对于这个版本的问题,我会使用 lag() 来确定组从哪里开始,然后是累加和聚合:

select dimensionid, DimensionAttribute,
       min(row_start_date), max(row_end_date)
from (select t.*,
             sum(case when prev_red = dateadd(day, -1, row_start_date)
                      then 0 else 1
                 end) over (partition by dimensionid, DimensionAttribute order by row_start_date) as grp
      from (select t.*, 
                   lag(row_end_date) over (partition by dimensionid, DimensionAttribute order by row_start_date) as prev_red
            from t 
           ) t
     ) t
group by dimensionid, DimensionAttribute, grp;

特别是,这将识别行中的间隙。它只会在行完全匹配时合并行——之前的结束日期比开始日期早一天。当然,这可以进行调整,以允许间隔 1 或 2 天或允许重叠。