按值对行进行分组和计数,直到它发生变化
Grouping and counting rows by value until it changes
我有一个 table 消息在发生时存储在其中。通常有一条消息 'A',有时 A 由一条消息 'B' 分隔。
现在我想对值进行分组,以便能够分析它们,例如找到最长的 'A'-streaks 或 'A'-streaks 的分布。
我已经尝试过 COUNT-OVER 查询,但每条消息都会继续计数。
SELECT message, COUNT(*) OVER (ORDER BY Timestamp RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
这是我的示例数据:
Timestamp Message
20150329 00:00 A
20150329 00:01 A
20150329 00:02 B
20150329 00:03 A
20150329 00:04 A
20150329 00:05 A
20150329 00:06 B
我想要以下输出
Message COUNT
A 2
B 1
A 3
B 1
这很有趣:)
;WITH cte as (
SELECT Messages.Message, Timestamp,
ROW_NUMBER() OVER(PARTITION BY Message ORDER BY Timestamp) AS gn,
ROW_NUMBER() OVER (ORDER BY Timestamp) AS rn
FROM Messages
), cte2 AS (
SELECT Message, Timestamp, gn, rn, gn - rn as gb
FROM cte
), cte3 AS (
SELECT Message, MIN(Timestamp) As Ts, COUNT(1) as Cnt
FROM cte2
GROUP BY Message, gb)
SELECT Message, Cnt FROM cte3
ORDER BY Ts
这是结果集:
Message Cnt
A 2
B 1
A 3
B 1
查询可能更短,但我 post 这样做是为了让您了解发生了什么。
结果完全符合要求。这是最重要的部分 gn - rn
这个想法是对每个分区中的行进行编号,同时对整个集合中的行进行编号,然后如果你从另一个中减去一个,你将得到 'rank' 每组。
;WITH cte as (
SELECT Messages.Message, Timestamp,
ROW_NUMBER() OVER(PARTITION BY Message ORDER BY Timestamp) AS gn,
ROW_NUMBER() OVER (ORDER BY Timestamp) AS rn
FROM Messages
), cte2 AS (
SELECT Message, Timestamp, gn, rn, gn - rn as gb
FROM cte
)
SELECT * FROM cte2
Message Timestamp gn rn gb
A 2015-03-29 00:00:00.000 1 1 0
A 2015-03-29 00:01:00.000 2 2 0
B 2015-03-29 00:02:00.000 1 3 -2
A 2015-03-29 00:03:00.000 3 4 -1
A 2015-03-29 00:04:00.000 4 5 -1
A 2015-03-29 00:05:00.000 5 6 -1
B 2015-03-29 00:06:00.000 2 7 -5
这里有一个更小的解决方案:
DECLARE @t TABLE ( d DATE, m CHAR(1) )
INSERT INTO @t
VALUES ( '20150301', 'A' ),
( '20150302', 'A' ),
( '20150303', 'B' ),
( '20150304', 'A' ),
( '20150305', 'A' ),
( '20150306', 'A' ),
( '20150307', 'B' );
WITH
c1 AS(SELECT d, m, IIF(LAG(m, 1, m) OVER(ORDER BY d) = m, 0, 1) AS n FROM @t),
c2 AS(SELECT m, SUM(n) OVER(ORDER BY d) AS n FROM c1)
SELECT m, COUNT(*) AS c
FROM c2
GROUP BY m, n
输出:
m c
A 2
B 1
A 3
B 1
想法是在消息更改的行处获取值 1
:
2015-03-01 A 0
2015-03-02 A 0
2015-03-03 B 1
2015-03-04 A 1
2015-03-05 A 0
2015-03-06 A 0
2015-03-07 B 1
第二步只是当前行值 + 所有前面值的总和:
2015-03-01 A 0
2015-03-02 A 0
2015-03-03 B 1
2015-03-04 A 2
2015-03-05 A 2
2015-03-06 A 2
2015-03-07 B 3
这样您就可以按消息列和计算列获得分组集。
我有一个 table 消息在发生时存储在其中。通常有一条消息 'A',有时 A 由一条消息 'B' 分隔。 现在我想对值进行分组,以便能够分析它们,例如找到最长的 'A'-streaks 或 'A'-streaks 的分布。
我已经尝试过 COUNT-OVER 查询,但每条消息都会继续计数。
SELECT message, COUNT(*) OVER (ORDER BY Timestamp RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
这是我的示例数据:
Timestamp Message
20150329 00:00 A
20150329 00:01 A
20150329 00:02 B
20150329 00:03 A
20150329 00:04 A
20150329 00:05 A
20150329 00:06 B
我想要以下输出
Message COUNT
A 2
B 1
A 3
B 1
这很有趣:)
;WITH cte as (
SELECT Messages.Message, Timestamp,
ROW_NUMBER() OVER(PARTITION BY Message ORDER BY Timestamp) AS gn,
ROW_NUMBER() OVER (ORDER BY Timestamp) AS rn
FROM Messages
), cte2 AS (
SELECT Message, Timestamp, gn, rn, gn - rn as gb
FROM cte
), cte3 AS (
SELECT Message, MIN(Timestamp) As Ts, COUNT(1) as Cnt
FROM cte2
GROUP BY Message, gb)
SELECT Message, Cnt FROM cte3
ORDER BY Ts
这是结果集:
Message Cnt
A 2
B 1
A 3
B 1
查询可能更短,但我 post 这样做是为了让您了解发生了什么。
结果完全符合要求。这是最重要的部分 gn - rn
这个想法是对每个分区中的行进行编号,同时对整个集合中的行进行编号,然后如果你从另一个中减去一个,你将得到 'rank' 每组。
;WITH cte as (
SELECT Messages.Message, Timestamp,
ROW_NUMBER() OVER(PARTITION BY Message ORDER BY Timestamp) AS gn,
ROW_NUMBER() OVER (ORDER BY Timestamp) AS rn
FROM Messages
), cte2 AS (
SELECT Message, Timestamp, gn, rn, gn - rn as gb
FROM cte
)
SELECT * FROM cte2
Message Timestamp gn rn gb
A 2015-03-29 00:00:00.000 1 1 0
A 2015-03-29 00:01:00.000 2 2 0
B 2015-03-29 00:02:00.000 1 3 -2
A 2015-03-29 00:03:00.000 3 4 -1
A 2015-03-29 00:04:00.000 4 5 -1
A 2015-03-29 00:05:00.000 5 6 -1
B 2015-03-29 00:06:00.000 2 7 -5
这里有一个更小的解决方案:
DECLARE @t TABLE ( d DATE, m CHAR(1) )
INSERT INTO @t
VALUES ( '20150301', 'A' ),
( '20150302', 'A' ),
( '20150303', 'B' ),
( '20150304', 'A' ),
( '20150305', 'A' ),
( '20150306', 'A' ),
( '20150307', 'B' );
WITH
c1 AS(SELECT d, m, IIF(LAG(m, 1, m) OVER(ORDER BY d) = m, 0, 1) AS n FROM @t),
c2 AS(SELECT m, SUM(n) OVER(ORDER BY d) AS n FROM c1)
SELECT m, COUNT(*) AS c
FROM c2
GROUP BY m, n
输出:
m c
A 2
B 1
A 3
B 1
想法是在消息更改的行处获取值 1
:
2015-03-01 A 0
2015-03-02 A 0
2015-03-03 B 1
2015-03-04 A 1
2015-03-05 A 0
2015-03-06 A 0
2015-03-07 B 1
第二步只是当前行值 + 所有前面值的总和:
2015-03-01 A 0
2015-03-02 A 0
2015-03-03 B 1
2015-03-04 A 2
2015-03-05 A 2
2015-03-06 A 2
2015-03-07 B 3
这样您就可以按消息列和计算列获得分组集。