T-SQL 从组中删除重复项但不从每个组中获得前 1

Question

I do NOT want to get top 1 from each group! Pay attention to the explanation which I have provided at the last portion of my question!

我有以下几行：

| Code | Type | SubType |    Date    |
|:----:|:----:|:-------:|:----------:|
|  100 |  10  |    1    | 17.12.2019 |
|  100 |  10  |    2    | 18.12.2019 |
|  100 |  10  |    2    | 19.12.2019 |
|  100 |  10  |    1    | 20.12.2019 |

我需要的是根据 Code、Type 和 SubType 列对行进行分组。但我不仅应该保留 Date 列，而且还必须从位于中间如下：

| Code | Type | SubType |    Date    |
|:----:|:----:|:-------:|:----------:|
|  100 |  10  |    1    | 17.12.2019 |
|  100 |  10  |    2    | 18.12.2019 |
|  100 |  10  |    1    | 20.12.2019 |

让我详细解释一下导致这种情况的情况，因此我需要在向最终用户显示之前清理我的数据。我有一个历史 table，它有 4 列（Code、Type、SubType 和 Date）。此 table 的每一行显示在特定日期该行的字段值发生的变化。例如，在上面的示例中，该行在 4 个不同的日期发生了 4 次更改。首先，该行已在 17.12.2019 处生成 Code = 100、Type = 10 和 SubType = 1。然后 SubType 在 18.12.2019 处更改为 2。第二天，在 19.12.2019，SubType 再次更改为 2（在我的例子中是重复的）。最后，SubType 在 20.12.2019 处再次更改为 1。事实上，我不需要显示第三个更改，因为它在我的例子中是重复的。

我尝试使用 Row_Number()Over(Partition by Code, Type and SubType Order By Date)，但没有成功。

Answer 1

您想保留发生变化的日期。我的建议是日期:

select t.*
from (select t.*,
             lag(date) over (partition by code, type, subtype order by date) as prev_cts_date,
             lag(date) over (order by date) as prev_date
      from t
     ) t
where prev_cts_date is null or prev_cts_date <> prev_date;

一种替代方法是对每一列进行 lag()，然后检查每个值是否有变化。这不仅麻烦，而且如果涉及 NULL 个值，逻辑会变得更糟。

这里的逻辑只是问："Is the previous date for the CTS combination the same as the previous date?"如果是，则丢弃记录。

Answer 2

在我看来，这就像一个缺口和孤岛问题。这是使用 row_number() 的一种方法：

select code, type, SubType, Date
from (
    select
        t.*,
        row_number() over(partition by code, type, rn1 - rn2 order by date) rn
    from (
        select 
            t.*,
            row_number() over(partition by code, type order by date) rn1,
            row_number() over(partition by code, type, SubType order by date) rn2
        from mytable t
    ) t
) t
where rn = 1

这通过计算 code, type 分区与 code, type, subtype 分区的行号差异来定义组。然后，我们 select 每组的第一条记录，再次使用 row_number()。

Demo on DB Fiddle:

code | type | SubType | Date      
---: | ---: | ------: | :---------
 100 |   10 |       1 | 17.12.2019
 100 |   10 |       2 | 18.12.2019
 100 |   10 |       1 | 20.12.2019

T-SQL 从组中删除重复项但不从每个组中获得前 1

T-SQL Remove Duplicates from Groups BUT NOT GET TOP 1 FROM EACH GROUP

sql

tsql

sql-server

window-functions

gaps-and-islands