如何在查询中创建顺序组号?

How can I create sequential group numbers in a query?

我有一个查询 return 的结果如下所示:

RowID IP         datetime1         datetime2     temp_violation
   ---------------------------------------------------------------
   1     'A'        '1-1-19'          '1-2-19'      0
   2     'A'        '1-2-19'          '1-3-19'      0
   3     'A'        '1-3-19'          '1-4-19'      0
   4     'A'        '1-4-19'          '1-5-19'      1
   5     'A'        '1-5-19'          '1-6-19'      1
   6     'A'        '1-6-19'          '1-7-19'      1
   7     'A'        '1-7-19'          '1-8-19'      0
   8     'A'        '1-8-19'          '1-9-19'      0
   9     'A'        '1-9-19'          '1-10-19'     0
   10    'B'        '1-1-19'          '1-2-19'      0
   11    'B'        '1-2-19'          '1-3-19'      0
   12    'B'        '1-3-19'          '1-4-19'      0
   13    'B'        '1-4-19'          '1-5-19'      1
   14    'B'        '1-5-19'          '1-6-19'      1
   15    'B'        '1-6-19'          '1-7-19'      1
   16    'B'        '1-7-19'          '1-8-19'      0
   17    'B'        '1-8-19'          '1-9-19'      0
   18    'B'        '1-9-19'          '1-10-19'     0

对于每个 IP,我需要 return 这样的结果集:

   RowID IP         datetime1         datetime2     temp_violation  groupnum
   -------------------------------------------------------------------------
   1     'A'        '1-1-19'          '1-2-19'      0               1
   2     'A'        '1-2-19'          '1-3-19'      0               1
   3     'A'        '1-3-19'          '1-4-19'      0               1
   4     'A'        '1-4-19'          '1-5-19'      1               2
   5     'A'        '1-5-19'          '1-6-19'      1               2
   6     'A'        '1-6-19'          '1-7-19'      1               2
   7     'A'        '1-7-19'          '1-8-19'      0               3
   8     'A'        '1-8-19'          '1-9-19'      0               3
   9     'A'        '1-9-19'          '1-10-19'     0               3
   10    'B'        '1-1-19'          '1-2-19'      0               1
   11    'B'        '1-2-19'          '1-3-19'      0               1
   12    'B'        '1-3-19'          '1-4-19'      0               1
   13    'B'        '1-4-19'          '1-5-19'      1               2
   14    'B'        '1-5-19'          '1-6-19'      1               2
   15    'B'        '1-6-19'          '1-7-19'      1               2
   16    'B'        '1-7-19'          '1-8-19'      0               3
   17    'B'        '1-8-19'          '1-9-19'      0               3
   18    'B'        '1-9-19'          '1-10-19'     0               3

例如:对于 IP A,违规从 0/0/0 变为 1/1/1 再变为 0/0/0,因此查询需要识别第一个 0/​​0/ 0 为第 1 组,然后将 1/1/1 识别为第 2 组,最后将第三个 0/0/0 识别为第 3 组。

对于 IP B 的行,我已经从 1 重新开始编号,但不需要重新开始 - 它可以将第一组标记为组 4,将下一个标记为第 5 组,下一个为第 6 组。唯一重要的是,对于每个 IP 和每个相似的 temp_violation 连续值,组号都是唯一的。这里棘手的部分是我不想循环遍历每一行,因为可能有数百万行,而且我不熟悉 CTE(我什至不知道他们是否会在这里提供帮助)。我用 row_number()rank()dense_rank()ntile() 尝试了一堆东西,但我找不到使用它们来实现此目的的巧妙方法。

这是一个缺口和孤岛问题。最简单的方法可能是 lag() 和累加和:

select t.*,
       sum(case when temp_violation = prev_tv then 0 else 1 end) over (partition by id order by rowid) as groupnum
from (select t.*,
             lag(temp_violation) over (partition by id order by rowid) as prev_tv
      from t
     ) t;

糟糕,我注意到您使用的是 SQL Server 2008,所以您没有 lag()。在这种情况下,行号的差异是更好的方法:

select t.*,
       dense_rank() over (partition by id order by min_rowid) as groupnum
from (select t.*,
             min(rowid) over (partition by id, temp_violation, seqnum - seqnum_2) as min_rowid
      from (select t.*,
                   row_number() over (partition by id order by rowid) as seqnum,
                   row_number() over (partition by id, temp_violation order by rowid) as seqnum_2
            from t
           ) t
     ) t;