如何在查询中创建顺序组号?
How can I create sequential group numbers in a query?
我有一个查询 return 的结果如下所示:
RowID IP datetime1 datetime2 temp_violation
---------------------------------------------------------------
1 'A' '1-1-19' '1-2-19' 0
2 'A' '1-2-19' '1-3-19' 0
3 'A' '1-3-19' '1-4-19' 0
4 'A' '1-4-19' '1-5-19' 1
5 'A' '1-5-19' '1-6-19' 1
6 'A' '1-6-19' '1-7-19' 1
7 'A' '1-7-19' '1-8-19' 0
8 'A' '1-8-19' '1-9-19' 0
9 'A' '1-9-19' '1-10-19' 0
10 'B' '1-1-19' '1-2-19' 0
11 'B' '1-2-19' '1-3-19' 0
12 'B' '1-3-19' '1-4-19' 0
13 'B' '1-4-19' '1-5-19' 1
14 'B' '1-5-19' '1-6-19' 1
15 'B' '1-6-19' '1-7-19' 1
16 'B' '1-7-19' '1-8-19' 0
17 'B' '1-8-19' '1-9-19' 0
18 'B' '1-9-19' '1-10-19' 0
对于每个 IP,我需要 return 这样的结果集:
RowID IP datetime1 datetime2 temp_violation groupnum
-------------------------------------------------------------------------
1 'A' '1-1-19' '1-2-19' 0 1
2 'A' '1-2-19' '1-3-19' 0 1
3 'A' '1-3-19' '1-4-19' 0 1
4 'A' '1-4-19' '1-5-19' 1 2
5 'A' '1-5-19' '1-6-19' 1 2
6 'A' '1-6-19' '1-7-19' 1 2
7 'A' '1-7-19' '1-8-19' 0 3
8 'A' '1-8-19' '1-9-19' 0 3
9 'A' '1-9-19' '1-10-19' 0 3
10 'B' '1-1-19' '1-2-19' 0 1
11 'B' '1-2-19' '1-3-19' 0 1
12 'B' '1-3-19' '1-4-19' 0 1
13 'B' '1-4-19' '1-5-19' 1 2
14 'B' '1-5-19' '1-6-19' 1 2
15 'B' '1-6-19' '1-7-19' 1 2
16 'B' '1-7-19' '1-8-19' 0 3
17 'B' '1-8-19' '1-9-19' 0 3
18 'B' '1-9-19' '1-10-19' 0 3
例如:对于 IP A
,违规从 0/0/0 变为 1/1/1 再变为 0/0/0,因此查询需要识别第一个 0/0/ 0 为第 1 组,然后将 1/1/1 识别为第 2 组,最后将第三个 0/0/0 识别为第 3 组。
对于 IP B
的行,我已经从 1 重新开始编号,但不需要重新开始 - 它可以将第一组标记为组 4,将下一个标记为第 5 组,下一个为第 6 组。唯一重要的是,对于每个 IP 和每个相似的 temp_violation
连续值,组号都是唯一的。这里棘手的部分是我不想循环遍历每一行,因为可能有数百万行,而且我不熟悉 CTE(我什至不知道他们是否会在这里提供帮助)。我用 row_number()
、rank()
、dense_rank()
和 ntile()
尝试了一堆东西,但我找不到使用它们来实现此目的的巧妙方法。
这是一个缺口和孤岛问题。最简单的方法可能是 lag()
和累加和:
select t.*,
sum(case when temp_violation = prev_tv then 0 else 1 end) over (partition by id order by rowid) as groupnum
from (select t.*,
lag(temp_violation) over (partition by id order by rowid) as prev_tv
from t
) t;
糟糕,我注意到您使用的是 SQL Server 2008,所以您没有 lag()
。在这种情况下,行号的差异是更好的方法:
select t.*,
dense_rank() over (partition by id order by min_rowid) as groupnum
from (select t.*,
min(rowid) over (partition by id, temp_violation, seqnum - seqnum_2) as min_rowid
from (select t.*,
row_number() over (partition by id order by rowid) as seqnum,
row_number() over (partition by id, temp_violation order by rowid) as seqnum_2
from t
) t
) t;
我有一个查询 return 的结果如下所示:
RowID IP datetime1 datetime2 temp_violation
---------------------------------------------------------------
1 'A' '1-1-19' '1-2-19' 0
2 'A' '1-2-19' '1-3-19' 0
3 'A' '1-3-19' '1-4-19' 0
4 'A' '1-4-19' '1-5-19' 1
5 'A' '1-5-19' '1-6-19' 1
6 'A' '1-6-19' '1-7-19' 1
7 'A' '1-7-19' '1-8-19' 0
8 'A' '1-8-19' '1-9-19' 0
9 'A' '1-9-19' '1-10-19' 0
10 'B' '1-1-19' '1-2-19' 0
11 'B' '1-2-19' '1-3-19' 0
12 'B' '1-3-19' '1-4-19' 0
13 'B' '1-4-19' '1-5-19' 1
14 'B' '1-5-19' '1-6-19' 1
15 'B' '1-6-19' '1-7-19' 1
16 'B' '1-7-19' '1-8-19' 0
17 'B' '1-8-19' '1-9-19' 0
18 'B' '1-9-19' '1-10-19' 0
对于每个 IP,我需要 return 这样的结果集:
RowID IP datetime1 datetime2 temp_violation groupnum
-------------------------------------------------------------------------
1 'A' '1-1-19' '1-2-19' 0 1
2 'A' '1-2-19' '1-3-19' 0 1
3 'A' '1-3-19' '1-4-19' 0 1
4 'A' '1-4-19' '1-5-19' 1 2
5 'A' '1-5-19' '1-6-19' 1 2
6 'A' '1-6-19' '1-7-19' 1 2
7 'A' '1-7-19' '1-8-19' 0 3
8 'A' '1-8-19' '1-9-19' 0 3
9 'A' '1-9-19' '1-10-19' 0 3
10 'B' '1-1-19' '1-2-19' 0 1
11 'B' '1-2-19' '1-3-19' 0 1
12 'B' '1-3-19' '1-4-19' 0 1
13 'B' '1-4-19' '1-5-19' 1 2
14 'B' '1-5-19' '1-6-19' 1 2
15 'B' '1-6-19' '1-7-19' 1 2
16 'B' '1-7-19' '1-8-19' 0 3
17 'B' '1-8-19' '1-9-19' 0 3
18 'B' '1-9-19' '1-10-19' 0 3
例如:对于 IP A
,违规从 0/0/0 变为 1/1/1 再变为 0/0/0,因此查询需要识别第一个 0/0/ 0 为第 1 组,然后将 1/1/1 识别为第 2 组,最后将第三个 0/0/0 识别为第 3 组。
对于 IP B
的行,我已经从 1 重新开始编号,但不需要重新开始 - 它可以将第一组标记为组 4,将下一个标记为第 5 组,下一个为第 6 组。唯一重要的是,对于每个 IP 和每个相似的 temp_violation
连续值,组号都是唯一的。这里棘手的部分是我不想循环遍历每一行,因为可能有数百万行,而且我不熟悉 CTE(我什至不知道他们是否会在这里提供帮助)。我用 row_number()
、rank()
、dense_rank()
和 ntile()
尝试了一堆东西,但我找不到使用它们来实现此目的的巧妙方法。
这是一个缺口和孤岛问题。最简单的方法可能是 lag()
和累加和:
select t.*,
sum(case when temp_violation = prev_tv then 0 else 1 end) over (partition by id order by rowid) as groupnum
from (select t.*,
lag(temp_violation) over (partition by id order by rowid) as prev_tv
from t
) t;
糟糕,我注意到您使用的是 SQL Server 2008,所以您没有 lag()
。在这种情况下,行号的差异是更好的方法:
select t.*,
dense_rank() over (partition by id order by min_rowid) as groupnum
from (select t.*,
min(rowid) over (partition by id, temp_violation, seqnum - seqnum_2) as min_rowid
from (select t.*,
row_number() over (partition by id order by rowid) as seqnum,
row_number() over (partition by id, temp_violation order by rowid) as seqnum_2
from t
) t
) t;