如何按 SQL 中的连续记录分组

How to group by continuous records in SQL

我的table有这些记录

ID  Colour
------------
 1   Red
 2   Red
 3   Red
 4   Red
 5   Red
 6   Green
 7   Green
 8   Green
 9   Green
10   Red
11   Red
12   Red
13   Red
14   Green
15   Green
16   Green
17   Blue
18   Blue
19   Red
20   Blue

我可以像这样轻松地按颜色分组

SELECT Colour, MIN(ID) AS iMin, MAX(ID) AS iMax
FROM MyTable
GROUP BY Colour

这会return这个结果

Colour     iMin     iMax
-------------------------
Red        1        19
Green      6        16
Blue       17       20

但这不是我想要的,因为红色并没有从 1 一直到 19,绿色打破了序列。

结果应该是这样的

Colour     iMin     iMax
------------------------
Red        1        5
Green      6        9
Red        10       13
Green      14       16
Blue       17       18
Red        19       19
Blue       20       20

我设法通过光标做到了这一点,但想知道是否有更有效的方法来做到这一点

这是一个缺口和孤岛问题。假设 id 不断递增,您可以使用 row_number() 之间的差异来定义具有相同 colour:

的 "adjacent" 组记录
select 
    colour, 
    min(id) iMin,
    max(id) iMax
from (
    select t.*, row_number() over(partition by colour order by id) rn
    from mytable t
) t
group by colour, id - rn
order by min(id)

Demo on DB Fiddle:

colour | iMin | iMax
:----- | ---: | ---:
Red    |    1 |    5
Green  |    6 |    9
Red    |   10 |   13
Green  |   14 |   16
Blue   |   17 |   18
Red    |   19 |   19
Blue   |   20 |   20

这是一个缺口和孤岛问题。你可以通过行号的不同来解决这个问题:

select colour, min(id), max(id)
from (select t.*,
             row_number() over (order by id) as seqnum,
             row_number() over (partition by colour order by id) as seqnum_c
      from t
     ) t
group by colour, (seqnum - seqnum_c);

Here 是一个 db<>fiddle.

很难解释这是如何工作的。但是,如果您查看子查询的结果,您将看到行号的差异如何识别相邻的颜色。

无论id列是否为整数且id列的值是否连续,查询均有效

;with c0 as(
select id, color,
       ROW_NUMBER() over(order by id)*
       (case when color <> LAG(color, 1, '') over(order by id) then 1 else 0 end) as color_id
from #temp
), c1 as(
select id, color, color_id, SUM(color_id) over(order by id) as color_gid
from c0
)
select color, MIN(id) as idMin, MAX(id) as idMax
from c1
group by color, color_gid

它可以扩展为按 a 列排序,按 b 列的连续值分组,并找到 c 列的聚合值,如下所示:

;with c0 as(
select C, B,
       ROW_NUMBER() over(order by A)*
       (case when B <> LAG(B, 1, '') over(order by A) then 1 else 0 end) as B_id
from TableName
), c1 as(
select C, B, B_id, SUM(B_id) over(order by A) as B_gid
from c0
)
select B, MIN(C) as CMin, MAX(C) as CMax
from c1
group by B, B_gid