如何按 SQL 中的连续记录分组
How to group by continuous records in SQL
我的table有这些记录
ID Colour
------------
1 Red
2 Red
3 Red
4 Red
5 Red
6 Green
7 Green
8 Green
9 Green
10 Red
11 Red
12 Red
13 Red
14 Green
15 Green
16 Green
17 Blue
18 Blue
19 Red
20 Blue
我可以像这样轻松地按颜色分组
SELECT Colour, MIN(ID) AS iMin, MAX(ID) AS iMax
FROM MyTable
GROUP BY Colour
这会return这个结果
Colour iMin iMax
-------------------------
Red 1 19
Green 6 16
Blue 17 20
但这不是我想要的,因为红色并没有从 1 一直到 19,绿色打破了序列。
结果应该是这样的
Colour iMin iMax
------------------------
Red 1 5
Green 6 9
Red 10 13
Green 14 16
Blue 17 18
Red 19 19
Blue 20 20
我设法通过光标做到了这一点,但想知道是否有更有效的方法来做到这一点
这是一个缺口和孤岛问题。假设 id
不断递增,您可以使用 row_number()
之间的差异来定义具有相同 colour
:
的 "adjacent" 组记录
select
colour,
min(id) iMin,
max(id) iMax
from (
select t.*, row_number() over(partition by colour order by id) rn
from mytable t
) t
group by colour, id - rn
order by min(id)
colour | iMin | iMax
:----- | ---: | ---:
Red | 1 | 5
Green | 6 | 9
Red | 10 | 13
Green | 14 | 16
Blue | 17 | 18
Red | 19 | 19
Blue | 20 | 20
这是一个缺口和孤岛问题。你可以通过行号的不同来解决这个问题:
select colour, min(id), max(id)
from (select t.*,
row_number() over (order by id) as seqnum,
row_number() over (partition by colour order by id) as seqnum_c
from t
) t
group by colour, (seqnum - seqnum_c);
Here 是一个 db<>fiddle.
很难解释这是如何工作的。但是,如果您查看子查询的结果,您将看到行号的差异如何识别相邻的颜色。
无论id列是否为整数且id列的值是否连续,查询均有效
;with c0 as(
select id, color,
ROW_NUMBER() over(order by id)*
(case when color <> LAG(color, 1, '') over(order by id) then 1 else 0 end) as color_id
from #temp
), c1 as(
select id, color, color_id, SUM(color_id) over(order by id) as color_gid
from c0
)
select color, MIN(id) as idMin, MAX(id) as idMax
from c1
group by color, color_gid
它可以扩展为按 a 列排序,按 b 列的连续值分组,并找到 c 列的聚合值,如下所示:
;with c0 as(
select C, B,
ROW_NUMBER() over(order by A)*
(case when B <> LAG(B, 1, '') over(order by A) then 1 else 0 end) as B_id
from TableName
), c1 as(
select C, B, B_id, SUM(B_id) over(order by A) as B_gid
from c0
)
select B, MIN(C) as CMin, MAX(C) as CMax
from c1
group by B, B_gid
我的table有这些记录
ID Colour
------------
1 Red
2 Red
3 Red
4 Red
5 Red
6 Green
7 Green
8 Green
9 Green
10 Red
11 Red
12 Red
13 Red
14 Green
15 Green
16 Green
17 Blue
18 Blue
19 Red
20 Blue
我可以像这样轻松地按颜色分组
SELECT Colour, MIN(ID) AS iMin, MAX(ID) AS iMax
FROM MyTable
GROUP BY Colour
这会return这个结果
Colour iMin iMax
-------------------------
Red 1 19
Green 6 16
Blue 17 20
但这不是我想要的,因为红色并没有从 1 一直到 19,绿色打破了序列。
结果应该是这样的
Colour iMin iMax
------------------------
Red 1 5
Green 6 9
Red 10 13
Green 14 16
Blue 17 18
Red 19 19
Blue 20 20
我设法通过光标做到了这一点,但想知道是否有更有效的方法来做到这一点
这是一个缺口和孤岛问题。假设 id
不断递增,您可以使用 row_number()
之间的差异来定义具有相同 colour
:
select
colour,
min(id) iMin,
max(id) iMax
from (
select t.*, row_number() over(partition by colour order by id) rn
from mytable t
) t
group by colour, id - rn
order by min(id)
colour | iMin | iMax :----- | ---: | ---: Red | 1 | 5 Green | 6 | 9 Red | 10 | 13 Green | 14 | 16 Blue | 17 | 18 Red | 19 | 19 Blue | 20 | 20
这是一个缺口和孤岛问题。你可以通过行号的不同来解决这个问题:
select colour, min(id), max(id)
from (select t.*,
row_number() over (order by id) as seqnum,
row_number() over (partition by colour order by id) as seqnum_c
from t
) t
group by colour, (seqnum - seqnum_c);
Here 是一个 db<>fiddle.
很难解释这是如何工作的。但是,如果您查看子查询的结果,您将看到行号的差异如何识别相邻的颜色。
无论id列是否为整数且id列的值是否连续,查询均有效
;with c0 as(
select id, color,
ROW_NUMBER() over(order by id)*
(case when color <> LAG(color, 1, '') over(order by id) then 1 else 0 end) as color_id
from #temp
), c1 as(
select id, color, color_id, SUM(color_id) over(order by id) as color_gid
from c0
)
select color, MIN(id) as idMin, MAX(id) as idMax
from c1
group by color, color_gid
它可以扩展为按 a 列排序,按 b 列的连续值分组,并找到 c 列的聚合值,如下所示:
;with c0 as(
select C, B,
ROW_NUMBER() over(order by A)*
(case when B <> LAG(B, 1, '') over(order by A) then 1 else 0 end) as B_id
from TableName
), c1 as(
select C, B, B_id, SUM(B_id) over(order by A) as B_gid
from c0
)
select B, MIN(C) as CMin, MAX(C) as CMax
from c1
group by B, B_gid