SQL 计算每个间隔增长的语句
SQL statement that calculates per-interval growth
在我们的数据库中,我们确实有一个 table 可以跟踪设备的功耗。插入新值的速率不固定,只有在确实发生变化时才会写入,因此值之间的时间距离是变化的,可能从 1 秒到几分钟不等。条目由时间戳和值组成。该值总是随着每一行的增加而增加,因为它计算的是千瓦时。
我想要实现的是:我想指定开始和结束日期时间,比方说一个月。我还想指定一个间隔,例如 15 分钟、1 小时、1 天或类似时间。我需要得到的结果是[作为日期时间的间隔开始],[该间隔的功耗],例如像这样(间隔将设置为 1 小时):
2015-01.01 08:00:00 - 65
2015-01.01 09:00:00 - 43
2015-01.01 10:00:00 - 56
这就是 table 的样子:
TimeStamp Value
-------------------------
2015-01-08 08:29:47, 5246
2015-01-08 08:36:15, 5247
2015-01-08 08:37:10, 5248
2015-01-08 08:38:01, 5249
2015-01-08 08:38:38, 5250
2015-01-08 08:38:51, 5251
2015-01-08 08:39:33, 5252
2015-01-08 08:40:20, 5253
2015-01-08 08:41:10, 5254
2015-01-09 08:56:25, 5255
2015-01-09 08:56:43, 5256
2015-01-09 08:57:31, 5257
2015-01-09 08:57:36, 5258
2015-01-09 08:58:02, 5259
2015-01-09 08:58:57, 5260
2015-01-09 08:59:27, 5261
2015-01-09 09:00:06, 5262
2015-01-09 09:00:59, 5263
2015-01-09 09:01:54, 5265
2015-01-09 09:02:44, 5266
2015-01-09 09:03:39, 5267
2015-01-09 09:04:22, 5268
2015-01-09 09:05:11, 5269
2015-01-09 09:06:08, 5270
我觉得我必须将 SUM()
函数与 GROUP BY
结合起来,但我不知道该怎么做,因为据我所知我也会必须仅考虑每个区间的 growth 而不是该区间内绝对值的总和。如果有人能带我走上正轨就好了。
我认为解决这个问题的最好方法是首先生成你的间隔,然后左连接你的数据,因为这首先使可变间隔的分组变得不那么复杂,也意味着你仍然可以获得间隔的结果没有数据。为此,您需要一个数字 table,因为很多人没有,下面是一种快速生成数字的方法:
WITH N1 AS (SELECT N FROM (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) t (N)),
N2 (N) AS (SELECT 1 FROM N1 AS N1 CROSS JOIN N1 AS N2),
Numbers (N) AS (SELECT ROW_NUMBER() OVER(ORDER BY N1.N) FROM N2 AS N1 CROSS JOIN N2 AS N2)
SELECT *
FROM Numbers;
这只是生成一个从 1 到 10,000 的序列。有关这方面的更多阅读,请参阅以下系列:
- Generate a set or sequence without loops – part 1
- Generate a set or sequence without loops – part 2
- Generate a set or sequence without loops – part 3
然后您可以定义开始时间、时间间隔和要显示的记录数,您可以根据数字 table 生成数据:
DECLARE @Start DATETIME2 = '2015-01-09 08:00',
@Interval INT = 60, -- INTERVAL IN MINUTES
@IntervalCount INT = 3; -- NUMBER OF INTERVALS TO SHOW
WITH N1 AS (SELECT N FROM (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) t (N)),
N2 (N) AS (SELECT 1 FROM N1 AS N1 CROSS JOIN N1 AS N2),
Numbers (N) AS (SELECT ROW_NUMBER() OVER(ORDER BY N1.N) FROM N2 AS N1 CROSS JOIN N2 AS N2)
SELECT TOP (@IntervalCount)
Interval = DATEADD(MINUTE, (N - 1) * @Interval, @Start)
FROM Numbers;
最后,您可以将其左联接到您的数据中,以获得每个间隔的最小值和最大值
DECLARE @Start DATETIME2 = '2015-01-09 08:00',
@Interval INT = 60, -- INTERVAL IN MINUTES
@IntervalCount INT = 3; -- NUMBER OF INTERVALS TO SHOW
WITH N1 AS (SELECT N FROM (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) t (N)),
N2 (N) AS (SELECT 1 FROM N1 AS N1 CROSS JOIN N1 AS N2),
Numbers (N) AS (SELECT ROW_NUMBER() OVER(ORDER BY N1.N) FROM N2 AS N1 CROSS JOIN N2 AS N2),
Intervals AS
( SELECT TOP (@IntervalCount)
IntervalStart = DATEADD(MINUTE, (N - 1) * @Interval, @Start),
IntervalEnd = DATEADD(MINUTE, N * @Interval, @Start)
FROM Numbers AS n
)
SELECT i.IntervalStart,
MinVal = MIN(t.Value),
MaxVal = MAX(t.Value),
Difference = ISNULL(MAX(t.Value) - MIN(t.Value), 0)
FROM Intervals AS i
LEFT JOIN T AS t
ON t.timestamp >= i.IntervalStart
AND t.timestamp < i.IntervalEnd
GROUP BY i.IntervalStart;
如果您的值可以在间隔内上下波动,那么您将需要使用排名函数来获取每小时的第一条和最后一条记录,而不是最小值和最大值:
DECLARE @Start DATETIME2 = '2015-01-09 08:00',
@Interval INT = 60, -- INTERVAL IN MINUTES
@IntervalCount INT = 3; -- NUMBER OF INTERVALS TO SHOW
WITH N1 AS (SELECT N FROM (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) t (N)),
N2 (N) AS (SELECT 1 FROM N1 AS N1 CROSS JOIN N1 AS N2),
Numbers (N) AS (SELECT ROW_NUMBER() OVER(ORDER BY N1.N) FROM N2 AS N1 CROSS JOIN N2 AS N2),
Intervals AS
( SELECT TOP (@IntervalCount)
IntervalStart = DATEADD(MINUTE, (N - 1) * @Interval, @Start),
IntervalEnd = DATEADD(MINUTE, N * @Interval, @Start)
FROM Numbers AS n
), RankedData AS
( SELECT i.IntervalStart,
t.Value,
t.timestamp,
RowNum = ROW_NUMBER() OVER(PARTITION BY i.IntervalStart ORDER BY t.timestamp),
TotalRows = COUNT(*) OVER(PARTITION BY i.IntervalStart)
FROM Intervals AS i
LEFT JOIN T AS t
ON t.timestamp >= i.IntervalStart
AND t.timestamp < i.IntervalEnd
)
SELECT r.IntervalStart,
Difference = ISNULL(MAX(CASE WHEN RowNum = TotalRows THEN r.Value END) -
MAX(CASE WHEN RowNum = 1 THEN r.Value END), 0)
FROM RankedData AS r
WHERE RowNum = 1
OR TotalRows = RowNum
GROUP BY r.IntervalStart;
Example on SQL Fiddle with 1 Hour intervals
Example on SQL Fiddle with 15 minute intervals
Example on SQL Fiddle with 1 Day intervals
编辑
正如评论中所指出的,上述两种解决方案都没有考虑到周期边界上的进步,下面将解释这一点:
DECLARE @Start DATETIME2 = '2015-01-09 08:25',
@Interval INT = 5, -- INTERVAL IN MINUTES
@IntervalCount INT = 18; -- NUMBER OF INTERVALS TO SHOW
WITH N1 AS (SELECT N FROM (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) t (N)),
N2 (N) AS (SELECT 1 FROM N1 AS N1 CROSS JOIN N1 AS N2),
Numbers (N) AS (SELECT ROW_NUMBER() OVER(ORDER BY N1.N) FROM N2 AS N1 CROSS JOIN N2 AS N2),
Intervals AS
( SELECT TOP (@IntervalCount)
IntervalStart = DATEADD(MINUTE, (N - 1) * @Interval, @Start),
IntervalEnd = DATEADD(MINUTE, (N - 0) * @Interval, @Start)
FROM Numbers AS n
), LeadData AS
( SELECT T.timestamp,
T.Value,
NextValue = nxt.value,
AdvanceRate = ISNULL(1.0 * (nxt.Value - T.Value) / DATEDIFF(SECOND, T.timestamp, nxt.timestamp), 0),
NextTimestamp = nxt.timestamp
FROM T AS T
OUTER APPLY
( SELECT TOP 1 T2.timestamp, T2.value
FROM T AS T2
WHERE T2.timestamp > T.timestamp
ORDER BY T2.timestamp
) AS nxt
)
SELECT i.IntervalStart,
Advance = CAST(ISNULL(SUM(DATEDIFF(SECOND, d.StartTime, d.EndTime) * t.AdvanceRate), 0) AS DECIMAL(10, 4))
FROM Intervals AS i
LEFT JOIN LeadData AS t
ON t.NextTimestamp >= i.IntervalStart
AND t.timestamp < i.IntervalEnd
OUTER APPLY
( SELECT CASE WHEN t.timestamp > i.IntervalStart THEN t.timestamp ELSE i.IntervalStart END,
CASE WHEN t.NextTimestamp < i.IntervalEnd THEN t.NextTimestamp ELSE i.IntervalEnd END
) AS d (StartTime, EndTime)
GROUP BY i.IntervalStart;
一种快速的方法是从您的时间戳中获取日期+小时,而不是对其进行 GROUP BY,并且功耗的值将是 MAX(Value) - MIN(Value)。您可以通过其他方式操作该 TimeStamp 以获得不同的时间间隔,此示例仅针对每小时消耗。
SELECT
CONVERT(datetime, CONVERT(varchar(10), TimeStamp, 120) + ' ' + CONVERT(varchar(2), DATEPART(hour, TimeStamp)) + ':00:00'),
MAX(Value) - MIN(Value) AS Value
FROM [Table]
GROUP BY CONVERT(datetime, CONVERT(varchar(10), TimeStamp, 120) + ' ' + CONVERT(varchar(2), DATEPART(hour, TimeStamp)) + ':00:00')
您的示例数据与结果区间不匹配,因此您可能会错过区间末尾或开头的增加。
因此,我假设样本数据行之间呈线性增加,并将它们与结果间隔相匹配。
declare @start datetime2 = '2015-01-09 09:00:00'
declare @end datetime2 = '2015-01-09 09:30:00'
declare @intervalMinutes int = 5
;with intervals as (
select @start iStart, dateadd(minute, @intervalMinutes, @start) iEnd
union all
select iEnd, dateadd(minute, @intervalMinutes, iEnd) from intervals
where iEnd < @end
), increases as (
select
T.Timestamp sStart,
lead(T.Timestamp, 1, null ) over (order by T.Timestamp) sEnd, -- the start of the next period if there is one, null else
lead(T.value, 1, null ) over (order by T.Timestamp) - T.value increase -- the increase within this period
from @T T
), rates as (
select
sStart rStart,
sEnd rEnd,
(cast(increase as float))/datediff(second, sStart, sEnd) rate -- increase/second
from increases where increase is not null
), samples as (
select *,
case when iStart > rStart then iStart else rStart end sStart, -- debug
case when rEnd>iEnd then iEnd else rEnd end sEnd, -- debug
datediff(second, case when iStart > rStart then iStart else rStart end, case when rEnd>iEnd then iEnd else rEnd end)*rate x -- increase within the period within the interval
from intervals i
left join rates r on rStart between iStart and iEnd or rEnd between iStart and iEnd or iStart between rStart and rEnd -- overlaps
)
select iStart, iEnd, isnull(sum(x), 0) from samples
group by iStart, iEnd
CTE:
intervals
包含您想要数据的间隔
increaese
计算样本数据周期内的增长
rates
计算样本数据周期内每秒的增加量
samples
通过考虑边界之间的重叠将结果区间与样本区间匹配
最后 select 总结了与单个间隔匹配的样本周期。
备注:
- 对于间隔量 > [你的最大递归深度],你必须使用另一个解决方案来创建
intervals
CTE(参见@GarethD 解决方案)
- 调试提示:只需使用
select * from samples
,您就可以看到与结果间隔相匹配的样本周期
在我们的数据库中,我们确实有一个 table 可以跟踪设备的功耗。插入新值的速率不固定,只有在确实发生变化时才会写入,因此值之间的时间距离是变化的,可能从 1 秒到几分钟不等。条目由时间戳和值组成。该值总是随着每一行的增加而增加,因为它计算的是千瓦时。
我想要实现的是:我想指定开始和结束日期时间,比方说一个月。我还想指定一个间隔,例如 15 分钟、1 小时、1 天或类似时间。我需要得到的结果是[作为日期时间的间隔开始],[该间隔的功耗],例如像这样(间隔将设置为 1 小时):
2015-01.01 08:00:00 - 65
2015-01.01 09:00:00 - 43
2015-01.01 10:00:00 - 56
这就是 table 的样子:
TimeStamp Value
-------------------------
2015-01-08 08:29:47, 5246
2015-01-08 08:36:15, 5247
2015-01-08 08:37:10, 5248
2015-01-08 08:38:01, 5249
2015-01-08 08:38:38, 5250
2015-01-08 08:38:51, 5251
2015-01-08 08:39:33, 5252
2015-01-08 08:40:20, 5253
2015-01-08 08:41:10, 5254
2015-01-09 08:56:25, 5255
2015-01-09 08:56:43, 5256
2015-01-09 08:57:31, 5257
2015-01-09 08:57:36, 5258
2015-01-09 08:58:02, 5259
2015-01-09 08:58:57, 5260
2015-01-09 08:59:27, 5261
2015-01-09 09:00:06, 5262
2015-01-09 09:00:59, 5263
2015-01-09 09:01:54, 5265
2015-01-09 09:02:44, 5266
2015-01-09 09:03:39, 5267
2015-01-09 09:04:22, 5268
2015-01-09 09:05:11, 5269
2015-01-09 09:06:08, 5270
我觉得我必须将 SUM()
函数与 GROUP BY
结合起来,但我不知道该怎么做,因为据我所知我也会必须仅考虑每个区间的 growth 而不是该区间内绝对值的总和。如果有人能带我走上正轨就好了。
我认为解决这个问题的最好方法是首先生成你的间隔,然后左连接你的数据,因为这首先使可变间隔的分组变得不那么复杂,也意味着你仍然可以获得间隔的结果没有数据。为此,您需要一个数字 table,因为很多人没有,下面是一种快速生成数字的方法:
WITH N1 AS (SELECT N FROM (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) t (N)),
N2 (N) AS (SELECT 1 FROM N1 AS N1 CROSS JOIN N1 AS N2),
Numbers (N) AS (SELECT ROW_NUMBER() OVER(ORDER BY N1.N) FROM N2 AS N1 CROSS JOIN N2 AS N2)
SELECT *
FROM Numbers;
这只是生成一个从 1 到 10,000 的序列。有关这方面的更多阅读,请参阅以下系列:
- Generate a set or sequence without loops – part 1
- Generate a set or sequence without loops – part 2
- Generate a set or sequence without loops – part 3
然后您可以定义开始时间、时间间隔和要显示的记录数,您可以根据数字 table 生成数据:
DECLARE @Start DATETIME2 = '2015-01-09 08:00',
@Interval INT = 60, -- INTERVAL IN MINUTES
@IntervalCount INT = 3; -- NUMBER OF INTERVALS TO SHOW
WITH N1 AS (SELECT N FROM (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) t (N)),
N2 (N) AS (SELECT 1 FROM N1 AS N1 CROSS JOIN N1 AS N2),
Numbers (N) AS (SELECT ROW_NUMBER() OVER(ORDER BY N1.N) FROM N2 AS N1 CROSS JOIN N2 AS N2)
SELECT TOP (@IntervalCount)
Interval = DATEADD(MINUTE, (N - 1) * @Interval, @Start)
FROM Numbers;
最后,您可以将其左联接到您的数据中,以获得每个间隔的最小值和最大值
DECLARE @Start DATETIME2 = '2015-01-09 08:00',
@Interval INT = 60, -- INTERVAL IN MINUTES
@IntervalCount INT = 3; -- NUMBER OF INTERVALS TO SHOW
WITH N1 AS (SELECT N FROM (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) t (N)),
N2 (N) AS (SELECT 1 FROM N1 AS N1 CROSS JOIN N1 AS N2),
Numbers (N) AS (SELECT ROW_NUMBER() OVER(ORDER BY N1.N) FROM N2 AS N1 CROSS JOIN N2 AS N2),
Intervals AS
( SELECT TOP (@IntervalCount)
IntervalStart = DATEADD(MINUTE, (N - 1) * @Interval, @Start),
IntervalEnd = DATEADD(MINUTE, N * @Interval, @Start)
FROM Numbers AS n
)
SELECT i.IntervalStart,
MinVal = MIN(t.Value),
MaxVal = MAX(t.Value),
Difference = ISNULL(MAX(t.Value) - MIN(t.Value), 0)
FROM Intervals AS i
LEFT JOIN T AS t
ON t.timestamp >= i.IntervalStart
AND t.timestamp < i.IntervalEnd
GROUP BY i.IntervalStart;
如果您的值可以在间隔内上下波动,那么您将需要使用排名函数来获取每小时的第一条和最后一条记录,而不是最小值和最大值:
DECLARE @Start DATETIME2 = '2015-01-09 08:00',
@Interval INT = 60, -- INTERVAL IN MINUTES
@IntervalCount INT = 3; -- NUMBER OF INTERVALS TO SHOW
WITH N1 AS (SELECT N FROM (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) t (N)),
N2 (N) AS (SELECT 1 FROM N1 AS N1 CROSS JOIN N1 AS N2),
Numbers (N) AS (SELECT ROW_NUMBER() OVER(ORDER BY N1.N) FROM N2 AS N1 CROSS JOIN N2 AS N2),
Intervals AS
( SELECT TOP (@IntervalCount)
IntervalStart = DATEADD(MINUTE, (N - 1) * @Interval, @Start),
IntervalEnd = DATEADD(MINUTE, N * @Interval, @Start)
FROM Numbers AS n
), RankedData AS
( SELECT i.IntervalStart,
t.Value,
t.timestamp,
RowNum = ROW_NUMBER() OVER(PARTITION BY i.IntervalStart ORDER BY t.timestamp),
TotalRows = COUNT(*) OVER(PARTITION BY i.IntervalStart)
FROM Intervals AS i
LEFT JOIN T AS t
ON t.timestamp >= i.IntervalStart
AND t.timestamp < i.IntervalEnd
)
SELECT r.IntervalStart,
Difference = ISNULL(MAX(CASE WHEN RowNum = TotalRows THEN r.Value END) -
MAX(CASE WHEN RowNum = 1 THEN r.Value END), 0)
FROM RankedData AS r
WHERE RowNum = 1
OR TotalRows = RowNum
GROUP BY r.IntervalStart;
Example on SQL Fiddle with 1 Hour intervals
Example on SQL Fiddle with 15 minute intervals
Example on SQL Fiddle with 1 Day intervals
编辑
正如评论中所指出的,上述两种解决方案都没有考虑到周期边界上的进步,下面将解释这一点:
DECLARE @Start DATETIME2 = '2015-01-09 08:25',
@Interval INT = 5, -- INTERVAL IN MINUTES
@IntervalCount INT = 18; -- NUMBER OF INTERVALS TO SHOW
WITH N1 AS (SELECT N FROM (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) t (N)),
N2 (N) AS (SELECT 1 FROM N1 AS N1 CROSS JOIN N1 AS N2),
Numbers (N) AS (SELECT ROW_NUMBER() OVER(ORDER BY N1.N) FROM N2 AS N1 CROSS JOIN N2 AS N2),
Intervals AS
( SELECT TOP (@IntervalCount)
IntervalStart = DATEADD(MINUTE, (N - 1) * @Interval, @Start),
IntervalEnd = DATEADD(MINUTE, (N - 0) * @Interval, @Start)
FROM Numbers AS n
), LeadData AS
( SELECT T.timestamp,
T.Value,
NextValue = nxt.value,
AdvanceRate = ISNULL(1.0 * (nxt.Value - T.Value) / DATEDIFF(SECOND, T.timestamp, nxt.timestamp), 0),
NextTimestamp = nxt.timestamp
FROM T AS T
OUTER APPLY
( SELECT TOP 1 T2.timestamp, T2.value
FROM T AS T2
WHERE T2.timestamp > T.timestamp
ORDER BY T2.timestamp
) AS nxt
)
SELECT i.IntervalStart,
Advance = CAST(ISNULL(SUM(DATEDIFF(SECOND, d.StartTime, d.EndTime) * t.AdvanceRate), 0) AS DECIMAL(10, 4))
FROM Intervals AS i
LEFT JOIN LeadData AS t
ON t.NextTimestamp >= i.IntervalStart
AND t.timestamp < i.IntervalEnd
OUTER APPLY
( SELECT CASE WHEN t.timestamp > i.IntervalStart THEN t.timestamp ELSE i.IntervalStart END,
CASE WHEN t.NextTimestamp < i.IntervalEnd THEN t.NextTimestamp ELSE i.IntervalEnd END
) AS d (StartTime, EndTime)
GROUP BY i.IntervalStart;
一种快速的方法是从您的时间戳中获取日期+小时,而不是对其进行 GROUP BY,并且功耗的值将是 MAX(Value) - MIN(Value)。您可以通过其他方式操作该 TimeStamp 以获得不同的时间间隔,此示例仅针对每小时消耗。
SELECT
CONVERT(datetime, CONVERT(varchar(10), TimeStamp, 120) + ' ' + CONVERT(varchar(2), DATEPART(hour, TimeStamp)) + ':00:00'),
MAX(Value) - MIN(Value) AS Value
FROM [Table]
GROUP BY CONVERT(datetime, CONVERT(varchar(10), TimeStamp, 120) + ' ' + CONVERT(varchar(2), DATEPART(hour, TimeStamp)) + ':00:00')
您的示例数据与结果区间不匹配,因此您可能会错过区间末尾或开头的增加。 因此,我假设样本数据行之间呈线性增加,并将它们与结果间隔相匹配。
declare @start datetime2 = '2015-01-09 09:00:00'
declare @end datetime2 = '2015-01-09 09:30:00'
declare @intervalMinutes int = 5
;with intervals as (
select @start iStart, dateadd(minute, @intervalMinutes, @start) iEnd
union all
select iEnd, dateadd(minute, @intervalMinutes, iEnd) from intervals
where iEnd < @end
), increases as (
select
T.Timestamp sStart,
lead(T.Timestamp, 1, null ) over (order by T.Timestamp) sEnd, -- the start of the next period if there is one, null else
lead(T.value, 1, null ) over (order by T.Timestamp) - T.value increase -- the increase within this period
from @T T
), rates as (
select
sStart rStart,
sEnd rEnd,
(cast(increase as float))/datediff(second, sStart, sEnd) rate -- increase/second
from increases where increase is not null
), samples as (
select *,
case when iStart > rStart then iStart else rStart end sStart, -- debug
case when rEnd>iEnd then iEnd else rEnd end sEnd, -- debug
datediff(second, case when iStart > rStart then iStart else rStart end, case when rEnd>iEnd then iEnd else rEnd end)*rate x -- increase within the period within the interval
from intervals i
left join rates r on rStart between iStart and iEnd or rEnd between iStart and iEnd or iStart between rStart and rEnd -- overlaps
)
select iStart, iEnd, isnull(sum(x), 0) from samples
group by iStart, iEnd
CTE:
intervals
包含您想要数据的间隔increaese
计算样本数据周期内的增长rates
计算样本数据周期内每秒的增加量samples
通过考虑边界之间的重叠将结果区间与样本区间匹配
最后 select 总结了与单个间隔匹配的样本周期。
备注:
- 对于间隔量 > [你的最大递归深度],你必须使用另一个解决方案来创建
intervals
CTE(参见@GarethD 解决方案) - 调试提示:只需使用
select * from samples
,您就可以看到与结果间隔相匹配的样本周期