按日期获取状态计数但只计算连续行
Get the count of statuses by date but only count continuous rows
我有这个数据:
ID Name Status Date
1 Machine1 Active 2018-01-01
2 Machine2 Fault 2018-01-01
3 Machine3 Active 2018-01-01
4 Machine1 Fault 2018-01-02
5 Machine2 Active 2018-01-02
6 Machine3 Active 2018-01-02
7 Machine2 Active 2018-01-03
8 Machine1 Fault 2018-01-03
9 Machine2 Active 2018-01-04
10 Machine1 Fault 2018-01-04
11 Machine3 Active 2018-01-06
输入
我想要输出这些数据
预期输出
Name Last Status Count
Machine1 Fault 3
Machine2 Active 3
Machine3 Active 1 Because Date is not Continuous
*Count : 连续历史中的最后一个状态数
我认为这会起作用,尽管 SQLFiddle 目前身体不适,所以我无法测试:
SELECT [Name], [Status], ct as [Count]
FROM (
SELECT
[name],
[status],
[date],
1 + (SUM( grp ) OVER (PARTITION BY [name], [status] ORDER BY [date] ROWS BETWEEN 1 PRECEDING AND 0 FOLLOWING ) * grp) ct,
row_number() over(partition by [name] order by [date] desc) rn
FROM
(
SELECT *, CASE WHEN LAG([Date]) OVER(PARTITION BY [name], [status] ORDER BY [date] ) = DATEADD(day, -1, [date]) THEN 1 ELSE 0 END grp
FROM t
) x
) y
WHERE
rn = 1
它首先使用 LAG 查看当前行和前一行(将数据分组为机器名称和状态,按日期对数据排序),如果当前日期与前一个日期相差 1 天,则它记录 1 否则记录 0
这些 1 和 0 以 运行ning 总的方式求和,当机器名称或状态改变时重置(sum() over() 的分区)
另外我们想只考虑机器名的数据,我们只想要每台机器的最新记录,所以我们按机器名分区,按日期降序计数,然后选择(使用 where 子句)每台机器编号为 1 的行
如果您 运行 单独查询,实际上更有意义,就像这样
计算 "is the current report consecutive with the previous report, for the given status and machine" 1 = 是,0 = 否:
SELECT *, CASE WHEN LAG([Date]) OVER(PARTITION BY [name], [status] ORDER BY [date] ) = DATEADD(day, -1, [date]) THEN 1 ELSE 0 END grp
FROM t
计算 "what is the running total of the current block of consecutive reports":
SELECT
[name],
[status],
[date],
1 + (SUM( grp ) OVER (PARTITION BY [name], [status] ORDER BY [date] ROWS BETWEEN 1 PRECEDING AND 0 FOLLOWING ) * grp) ct,
row_number() over(partition by [name] order by [date] desc) rn
FROM
(
SELECT *, CASE WHEN LAG([Date]) OVER(PARTITION BY [name], [status] ORDER BY [date] ) = DATEADD(day, -1, [date]) THEN 1 ELSE 0 END grp
FROM t
) x
然后当然是整件事,但没有 where 子句,所以您可以看到我们正在丢弃的数据:
SELECT [Name], [Status], ct as [Count]
FROM (
SELECT
[name],
[status],
[date],
1 + (SUM( grp ) OVER (PARTITION BY [name], [status] ORDER BY [date] ROWS BETWEEN 1 PRECEDING AND 0 FOLLOWING ) * grp) ct,
row_number() over(partition by [name] order by [date] desc) rn
FROM
(
SELECT *, CASE WHEN LAG([Date]) OVER(PARTITION BY [name], [status] ORDER BY [date] ) = DATEADD(day, -1, [date]) THEN 1 ELSE 0 END grp
FROM t
) x
) y
Fiddle终于醒了:
我相信就这么简单:
WITH cte1 AS (
SELECT
Name,
Status,
DATEADD(DAY, ROW_NUMBER() OVER (PARTITION BY Name, Status ORDER BY Date DESC) - 1, Date) AS GroupingDate
FROM testdata
), cte2 AS (
SELECT
Name,
Status,
RANK() OVER (PARTITION BY Name ORDER BY GroupingDate DESC) AS GroupingNumber
FROM cte1
)
SELECT Name, Status AS LastStatus, COUNT(*) AS LastStatusCount
FROM cte2
WHERE GroupingNumber = 1
GROUP BY Name, Status
ORDER BY Name
| Name | LastStatus | LastStatusCount |
|----------|------------|-----------------|
| Machine1 | Fault | 3 |
| Machine2 | Active | 3 |
| Machine3 | Active | 1 |
为了了解这是如何工作的,请查看 CTE 生成的中间值:
| Name | Status | Date | RowNumber | GroupingDate | GroupingNumber |
|----------|--------|---------------------|-----------|---------------------|----------------|
| Machine1 | Fault | 04/01/2018 00:00:00 | 1 | 04/01/2018 00:00:00 | 1 |
| Machine1 | Fault | 03/01/2018 00:00:00 | 2 | 04/01/2018 00:00:00 | 1 |
| Machine1 | Fault | 02/01/2018 00:00:00 | 3 | 04/01/2018 00:00:00 | 1 |
| Machine1 | Active | 01/01/2018 00:00:00 | 1 | 01/01/2018 00:00:00 | 4 |
| Machine2 | Active | 04/01/2018 00:00:00 | 1 | 04/01/2018 00:00:00 | 1 |
| Machine2 | Active | 03/01/2018 00:00:00 | 2 | 04/01/2018 00:00:00 | 1 |
| Machine2 | Active | 02/01/2018 00:00:00 | 3 | 04/01/2018 00:00:00 | 1 |
| Machine2 | Fault | 01/01/2018 00:00:00 | 1 | 01/01/2018 00:00:00 | 4 |
| Machine3 | Active | 06/01/2018 00:00:00 | 1 | 06/01/2018 00:00:00 | 1 |
| Machine3 | Active | 02/01/2018 00:00:00 | 2 | 03/01/2018 00:00:00 | 2 |
| Machine3 | Active | 01/01/2018 00:00:00 | 3 | 03/01/2018 00:00:00 | 2 |
这里的诀窍是,如果两个数字是连续的,那么从它们中减去连续的数字将得到相同的值。例如。如果我们有 5, 6, 8, 9
那么按这个顺序减去 1, 2, 3, 4
将产生 4, 4, 5, 5
.
我有这个数据:
ID Name Status Date
1 Machine1 Active 2018-01-01
2 Machine2 Fault 2018-01-01
3 Machine3 Active 2018-01-01
4 Machine1 Fault 2018-01-02
5 Machine2 Active 2018-01-02
6 Machine3 Active 2018-01-02
7 Machine2 Active 2018-01-03
8 Machine1 Fault 2018-01-03
9 Machine2 Active 2018-01-04
10 Machine1 Fault 2018-01-04
11 Machine3 Active 2018-01-06
输入
我想要输出这些数据
预期输出
Name Last Status Count
Machine1 Fault 3
Machine2 Active 3
Machine3 Active 1 Because Date is not Continuous
*Count : 连续历史中的最后一个状态数
我认为这会起作用,尽管 SQLFiddle 目前身体不适,所以我无法测试:
SELECT [Name], [Status], ct as [Count]
FROM (
SELECT
[name],
[status],
[date],
1 + (SUM( grp ) OVER (PARTITION BY [name], [status] ORDER BY [date] ROWS BETWEEN 1 PRECEDING AND 0 FOLLOWING ) * grp) ct,
row_number() over(partition by [name] order by [date] desc) rn
FROM
(
SELECT *, CASE WHEN LAG([Date]) OVER(PARTITION BY [name], [status] ORDER BY [date] ) = DATEADD(day, -1, [date]) THEN 1 ELSE 0 END grp
FROM t
) x
) y
WHERE
rn = 1
它首先使用 LAG 查看当前行和前一行(将数据分组为机器名称和状态,按日期对数据排序),如果当前日期与前一个日期相差 1 天,则它记录 1 否则记录 0
这些 1 和 0 以 运行ning 总的方式求和,当机器名称或状态改变时重置(sum() over() 的分区)
另外我们想只考虑机器名的数据,我们只想要每台机器的最新记录,所以我们按机器名分区,按日期降序计数,然后选择(使用 where 子句)每台机器编号为 1 的行
如果您 运行 单独查询,实际上更有意义,就像这样
计算 "is the current report consecutive with the previous report, for the given status and machine" 1 = 是,0 = 否:
SELECT *, CASE WHEN LAG([Date]) OVER(PARTITION BY [name], [status] ORDER BY [date] ) = DATEADD(day, -1, [date]) THEN 1 ELSE 0 END grp
FROM t
计算 "what is the running total of the current block of consecutive reports":
SELECT
[name],
[status],
[date],
1 + (SUM( grp ) OVER (PARTITION BY [name], [status] ORDER BY [date] ROWS BETWEEN 1 PRECEDING AND 0 FOLLOWING ) * grp) ct,
row_number() over(partition by [name] order by [date] desc) rn
FROM
(
SELECT *, CASE WHEN LAG([Date]) OVER(PARTITION BY [name], [status] ORDER BY [date] ) = DATEADD(day, -1, [date]) THEN 1 ELSE 0 END grp
FROM t
) x
然后当然是整件事,但没有 where 子句,所以您可以看到我们正在丢弃的数据:
SELECT [Name], [Status], ct as [Count]
FROM (
SELECT
[name],
[status],
[date],
1 + (SUM( grp ) OVER (PARTITION BY [name], [status] ORDER BY [date] ROWS BETWEEN 1 PRECEDING AND 0 FOLLOWING ) * grp) ct,
row_number() over(partition by [name] order by [date] desc) rn
FROM
(
SELECT *, CASE WHEN LAG([Date]) OVER(PARTITION BY [name], [status] ORDER BY [date] ) = DATEADD(day, -1, [date]) THEN 1 ELSE 0 END grp
FROM t
) x
) y
Fiddle终于醒了:
我相信就这么简单:
WITH cte1 AS (
SELECT
Name,
Status,
DATEADD(DAY, ROW_NUMBER() OVER (PARTITION BY Name, Status ORDER BY Date DESC) - 1, Date) AS GroupingDate
FROM testdata
), cte2 AS (
SELECT
Name,
Status,
RANK() OVER (PARTITION BY Name ORDER BY GroupingDate DESC) AS GroupingNumber
FROM cte1
)
SELECT Name, Status AS LastStatus, COUNT(*) AS LastStatusCount
FROM cte2
WHERE GroupingNumber = 1
GROUP BY Name, Status
ORDER BY Name
| Name | LastStatus | LastStatusCount |
|----------|------------|-----------------|
| Machine1 | Fault | 3 |
| Machine2 | Active | 3 |
| Machine3 | Active | 1 |
为了了解这是如何工作的,请查看 CTE 生成的中间值:
| Name | Status | Date | RowNumber | GroupingDate | GroupingNumber |
|----------|--------|---------------------|-----------|---------------------|----------------|
| Machine1 | Fault | 04/01/2018 00:00:00 | 1 | 04/01/2018 00:00:00 | 1 |
| Machine1 | Fault | 03/01/2018 00:00:00 | 2 | 04/01/2018 00:00:00 | 1 |
| Machine1 | Fault | 02/01/2018 00:00:00 | 3 | 04/01/2018 00:00:00 | 1 |
| Machine1 | Active | 01/01/2018 00:00:00 | 1 | 01/01/2018 00:00:00 | 4 |
| Machine2 | Active | 04/01/2018 00:00:00 | 1 | 04/01/2018 00:00:00 | 1 |
| Machine2 | Active | 03/01/2018 00:00:00 | 2 | 04/01/2018 00:00:00 | 1 |
| Machine2 | Active | 02/01/2018 00:00:00 | 3 | 04/01/2018 00:00:00 | 1 |
| Machine2 | Fault | 01/01/2018 00:00:00 | 1 | 01/01/2018 00:00:00 | 4 |
| Machine3 | Active | 06/01/2018 00:00:00 | 1 | 06/01/2018 00:00:00 | 1 |
| Machine3 | Active | 02/01/2018 00:00:00 | 2 | 03/01/2018 00:00:00 | 2 |
| Machine3 | Active | 01/01/2018 00:00:00 | 3 | 03/01/2018 00:00:00 | 2 |
这里的诀窍是,如果两个数字是连续的,那么从它们中减去连续的数字将得到相同的值。例如。如果我们有 5, 6, 8, 9
那么按这个顺序减去 1, 2, 3, 4
将产生 4, 4, 5, 5
.