SQL sectioning/averaging 基于不同的 timetag/timestamps 和用户选择的输入 (T-SQL)
SQL sectioning/averaging based on different timetag/timestamps and user-chosen input (T-SQL)
我有以下问题想要解决 - 我有一个 SQL 数据集,我想对其进行分段(例如像这个):
OldTimetag OldValue
2012-05-03 12:47:00 5
2012-05-03 13:00:00 1.3
2012-05-03 13:21:00 7
2012-05-03 14:56:00 5
2012-05-03 14:57:00 0.3
.... ....
现在,我想根据用户选择的间隔将数据分段(and/or 平均值)- 放入新的时间标签中,例如每 15 分钟以第一个时间标签为起点,即:
NewTimetag NewValue
2012-05-03 12:47:00 4.507
2012-05-03 13:02:00 1.3
.... ....
主要约束是时间标签旁边的值始终有效,直到出现下一个时间标签。因此 timetag 2012-05-03 12:47:00 的值 5 在接下来的 13 分钟内有效,直到 13:00:00。 12:47:00 前 15 分钟的值为 (13*5+2*1.3)/15 = 4.507。在接下来的 15 分钟内,在 13:02:00 处,该值仅等于 1.3...(依此类推)
我来了这么久,最好先做一个 "artificial table",然后再与旧的 table 合并。我通过以下方式生成 table:
DECLARE @intStart datetime, @intEnd datetime
SELECT @intStart =min(OldTimetag), @intEnd = MAX(OldTimetag)
FROM OldTable
where OldTimetag between '2012-05-03 12:47:00' and '2012-05-03 14:57:00'
Declare @ArtificalTable table (NewTimeTag datetime, NewValue Float)
Declare @MinuteSlicer Int
Set @MinuteSlicer = 15
Insert @Hallo Select @intStart, null
While ( @intStart < @intEnd ) BEGIN
Insert @ArtificalTable
Select DATEADD(mi,@MinuteSlicer, @intStart), Null
Set @intStart = DATEADD(mi,@MinuteSlicer,@intStart)
If @intEnd <= DATEADD(mi,@MinuteSlicer,@intStart)
Break
End
这给了我这样的输出:
NewTimetag NewValue
2012-05-03 12:47:00 Null
2012-05-03 13:02:00 Null
.... ....
但是,我在下一步如何正确加入 table 时遇到问题 - 谁能给我提示?
这是一种方法。
示例数据:
declare @data table(OldTimetag datetime2, OldValue numeric(5,2));
Insert into @data(OldTimetag, OldValue) Values
('2012-05-03 12:47:00', 5)
, ('2012-05-03 13:00:00', 1.3)
, ('2012-05-03 13:21:00', 7)
, ('2012-05-03 14:56:00', 5)
, ('2012-05-03 14:57:00', 0.3);
您的自定义范围大小(分钟):
declare @mins int = 15;
列表用于快速计算从 0 到 n 的有序列表,其中 n <= 第一个和最后一个之间的分钟数 OldTimetag
。
With list(n) as (
Select top(Select 1+DATEDIFF(minute, min(OldTimetag), max(OldTimetag)) From @data)
ROW_NUMBER() over(order by (select 1))-1
From (
Select 1 From (values(1), (1), (1), (1), (1), (1), (1), (1), (1), (1)) as x1(n)
Cross Join (values(1), (1), (1), (1), (1), (1), (1), (1), (1), (1)) as x2(n)
Cross Join (values(1), (1), (1), (1), (1), (1), (1), (1), (1), (1)) as x3(n)
) as x(n)
)
Select NewTimetag = DATEADD(minute, @mins*(l.n/@mins), MIN(r.startTime)), NewValue = AVG(d.oldValue)
From list l
Cross Join (Select startTime = min(OldTimetag) From @data) as r
Cross Apply (Select maxTimetag = MAX(OldTimetag) From @data Where OldTimetag <= DATEADD(minute, n, startTime)) as mx
Inner Join @data d on d.OldTimetag = mx.maxTimetag
Group By l.n/@mins
Cross Join
用于将有序列表中的每个数字与数据中的第一个 OldTimetag
混合。
Cross Apply
用于在使用 Cross Join
. 创建的每分钟之前获取最近的 OldTimetag
Inner Join
然后将最近的 OldTimetag
与您的数据匹配,以便检索 oldValue
.
Select
只需计算@mins 分钟内每个范围的平均值及其 NewTimetag
.
在最短和最长 OldTimetag
之间长达 1000 分钟的范围内效果很好。如果您需要超出此限制,您可以在列表 CTE 中添加第 4 行:
Cross Join (values(1), (1), (1), (1), (1), (1), (1), (1), (1), (1)) as x4(n) => up to 10.000
Cross Join (values(1), (1), (1), (1), (1), (1), (1), (1), (1), (1)) as x5(n) => up to 100.000
...
一种方法是确定间隔(如果间隔至少包含一个时间戳,则生成一个间隔),用下一个时间戳增加时间 table,然后通过与时间间隔 table.
IF OBJECT_ID('tempdb..#values') IS NOT NULL DROP TABLE #values
CREATE TABLE #values (pk int identity, time datetime, value numeric(10,4))
INSERT INTO #values VALUES ('2012-05-03 12:47:00', 5)
INSERT INTO #values VALUES ('2012-05-03 13:00:00', 1.3)
INSERT INTO #values VALUES ('2012-05-03 13:21:00', 7)
INSERT INTO #values VALUES ('2012-05-03 14:56:00', 5)
INSERT INTO #values VALUES ('2012-05-03 14:57:00', 0.3)
DECLARE @timeSpanMinutes int SET @timeSpanMinutes=15
DECLARE @startTime datetime, @endTtime datetime
SELECT @startTime=MIN(time) FROM #values
SELECT @endTtime =DATEADD(MINUTE,(DATEDIFF(MINUTE,@startTime,MAX(time))
/@timeSpanMinutes+1)*@timeSpanMinutes, @startTime) FROM #values -- MAX(time) multiple
SELECT intervals.start
, SUM(value*(DATEDIFF(MINUTE -- minutes in intersection of [start,end] and [time,next]
, CASE WHEN time<start THEN start ELSE time END -- Maximum(time,start)
, CASE WHEN next<DATEADD(MINUTE,@timeSpanMinutes,intervals.start) THEN next
ELSE DATEADD(MINUTE,@timeSpanMinutes,intervals.start) END -- Minimum(next,end)
)*1.0/@timeSpanMinutes)) as average
FROM
(SELECT DISTINCT DATEADD(MINUTE, (DATEDIFF(MINUTE,@startTime,time)
/@timeSpanMinutes)*@timeSpanMinutes, @startTime) AS start
FROM #values -- round start to multiple of @timeSpanMinutes
UNION SELECT DISTINCT DATEADD(MINUTE,@timeSpanMinutes+(DATEDIFF(MINUTE,@startTime,time)
/@timeSpanMinutes)*@timeSpanMinutes, @startTime)
FROM #values -- union distinct with same as above but shifted with @timeSpanMinutes
) intervals -- intervals start time (end is calculated as start + @timeSpanMinutes)
INNER JOIN
(SELECT v.*,ISNULL((SELECT MIN(time) FROM #values WHERE time>v.time),@endTtime) as next
FROM #values v -- add next column to #values
) vals
ON vals.next>=intervals.start and vals.time<=DATEADD(MINUTE,@timeSpanMinutes,start)
WHERE intervals.start<>@endTtime
GROUP BY intervals.start
ORDER BY intervals.start
我有以下问题想要解决 - 我有一个 SQL 数据集,我想对其进行分段(例如像这个):
OldTimetag OldValue
2012-05-03 12:47:00 5
2012-05-03 13:00:00 1.3
2012-05-03 13:21:00 7
2012-05-03 14:56:00 5
2012-05-03 14:57:00 0.3
.... ....
现在,我想根据用户选择的间隔将数据分段(and/or 平均值)- 放入新的时间标签中,例如每 15 分钟以第一个时间标签为起点,即:
NewTimetag NewValue
2012-05-03 12:47:00 4.507
2012-05-03 13:02:00 1.3
.... ....
主要约束是时间标签旁边的值始终有效,直到出现下一个时间标签。因此 timetag 2012-05-03 12:47:00 的值 5 在接下来的 13 分钟内有效,直到 13:00:00。 12:47:00 前 15 分钟的值为 (13*5+2*1.3)/15 = 4.507。在接下来的 15 分钟内,在 13:02:00 处,该值仅等于 1.3...(依此类推)
我来了这么久,最好先做一个 "artificial table",然后再与旧的 table 合并。我通过以下方式生成 table:
DECLARE @intStart datetime, @intEnd datetime
SELECT @intStart =min(OldTimetag), @intEnd = MAX(OldTimetag)
FROM OldTable
where OldTimetag between '2012-05-03 12:47:00' and '2012-05-03 14:57:00'
Declare @ArtificalTable table (NewTimeTag datetime, NewValue Float)
Declare @MinuteSlicer Int
Set @MinuteSlicer = 15
Insert @Hallo Select @intStart, null
While ( @intStart < @intEnd ) BEGIN
Insert @ArtificalTable
Select DATEADD(mi,@MinuteSlicer, @intStart), Null
Set @intStart = DATEADD(mi,@MinuteSlicer,@intStart)
If @intEnd <= DATEADD(mi,@MinuteSlicer,@intStart)
Break
End
这给了我这样的输出:
NewTimetag NewValue
2012-05-03 12:47:00 Null
2012-05-03 13:02:00 Null
.... ....
但是,我在下一步如何正确加入 table 时遇到问题 - 谁能给我提示?
这是一种方法。
示例数据:
declare @data table(OldTimetag datetime2, OldValue numeric(5,2));
Insert into @data(OldTimetag, OldValue) Values
('2012-05-03 12:47:00', 5)
, ('2012-05-03 13:00:00', 1.3)
, ('2012-05-03 13:21:00', 7)
, ('2012-05-03 14:56:00', 5)
, ('2012-05-03 14:57:00', 0.3);
您的自定义范围大小(分钟):
declare @mins int = 15;
列表用于快速计算从 0 到 n 的有序列表,其中 n <= 第一个和最后一个之间的分钟数 OldTimetag
。
With list(n) as (
Select top(Select 1+DATEDIFF(minute, min(OldTimetag), max(OldTimetag)) From @data)
ROW_NUMBER() over(order by (select 1))-1
From (
Select 1 From (values(1), (1), (1), (1), (1), (1), (1), (1), (1), (1)) as x1(n)
Cross Join (values(1), (1), (1), (1), (1), (1), (1), (1), (1), (1)) as x2(n)
Cross Join (values(1), (1), (1), (1), (1), (1), (1), (1), (1), (1)) as x3(n)
) as x(n)
)
Select NewTimetag = DATEADD(minute, @mins*(l.n/@mins), MIN(r.startTime)), NewValue = AVG(d.oldValue)
From list l
Cross Join (Select startTime = min(OldTimetag) From @data) as r
Cross Apply (Select maxTimetag = MAX(OldTimetag) From @data Where OldTimetag <= DATEADD(minute, n, startTime)) as mx
Inner Join @data d on d.OldTimetag = mx.maxTimetag
Group By l.n/@mins
Cross Join
用于将有序列表中的每个数字与数据中的第一个OldTimetag
混合。Cross Apply
用于在使用Cross Join
. 创建的每分钟之前获取最近的 Inner Join
然后将最近的OldTimetag
与您的数据匹配,以便检索oldValue
.Select
只需计算@mins 分钟内每个范围的平均值及其NewTimetag
.
OldTimetag
在最短和最长 OldTimetag
之间长达 1000 分钟的范围内效果很好。如果您需要超出此限制,您可以在列表 CTE 中添加第 4 行:
Cross Join (values(1), (1), (1), (1), (1), (1), (1), (1), (1), (1)) as x4(n) => up to 10.000
Cross Join (values(1), (1), (1), (1), (1), (1), (1), (1), (1), (1)) as x5(n) => up to 100.000
...
一种方法是确定间隔(如果间隔至少包含一个时间戳,则生成一个间隔),用下一个时间戳增加时间 table,然后通过与时间间隔 table.
IF OBJECT_ID('tempdb..#values') IS NOT NULL DROP TABLE #values
CREATE TABLE #values (pk int identity, time datetime, value numeric(10,4))
INSERT INTO #values VALUES ('2012-05-03 12:47:00', 5)
INSERT INTO #values VALUES ('2012-05-03 13:00:00', 1.3)
INSERT INTO #values VALUES ('2012-05-03 13:21:00', 7)
INSERT INTO #values VALUES ('2012-05-03 14:56:00', 5)
INSERT INTO #values VALUES ('2012-05-03 14:57:00', 0.3)
DECLARE @timeSpanMinutes int SET @timeSpanMinutes=15
DECLARE @startTime datetime, @endTtime datetime
SELECT @startTime=MIN(time) FROM #values
SELECT @endTtime =DATEADD(MINUTE,(DATEDIFF(MINUTE,@startTime,MAX(time))
/@timeSpanMinutes+1)*@timeSpanMinutes, @startTime) FROM #values -- MAX(time) multiple
SELECT intervals.start
, SUM(value*(DATEDIFF(MINUTE -- minutes in intersection of [start,end] and [time,next]
, CASE WHEN time<start THEN start ELSE time END -- Maximum(time,start)
, CASE WHEN next<DATEADD(MINUTE,@timeSpanMinutes,intervals.start) THEN next
ELSE DATEADD(MINUTE,@timeSpanMinutes,intervals.start) END -- Minimum(next,end)
)*1.0/@timeSpanMinutes)) as average
FROM
(SELECT DISTINCT DATEADD(MINUTE, (DATEDIFF(MINUTE,@startTime,time)
/@timeSpanMinutes)*@timeSpanMinutes, @startTime) AS start
FROM #values -- round start to multiple of @timeSpanMinutes
UNION SELECT DISTINCT DATEADD(MINUTE,@timeSpanMinutes+(DATEDIFF(MINUTE,@startTime,time)
/@timeSpanMinutes)*@timeSpanMinutes, @startTime)
FROM #values -- union distinct with same as above but shifted with @timeSpanMinutes
) intervals -- intervals start time (end is calculated as start + @timeSpanMinutes)
INNER JOIN
(SELECT v.*,ISNULL((SELECT MIN(time) FROM #values WHERE time>v.time),@endTtime) as next
FROM #values v -- add next column to #values
) vals
ON vals.next>=intervals.start and vals.time<=DATEADD(MINUTE,@timeSpanMinutes,start)
WHERE intervals.start<>@endTtime
GROUP BY intervals.start
ORDER BY intervals.start