SQL sectioning/averaging 基于不同的 timetag/timestamps 和用户选择的输入 (T-SQL)

SQL sectioning/averaging based on different timetag/timestamps and user-chosen input (T-SQL)

我有以下问题想要解决 - 我有一个 SQL 数据集,我想对其进行分段(例如像这个):

OldTimetag                 OldValue
2012-05-03 12:47:00        5
2012-05-03 13:00:00        1.3
2012-05-03 13:21:00        7
2012-05-03 14:56:00        5
2012-05-03 14:57:00        0.3
....                       ....

现在,我想根据用户选择的间隔将数据分段(and/or 平均值)- 放入新的时间标签中,例如每 15 分钟以第一个时间标签为起点,即:

NewTimetag                 NewValue
2012-05-03 12:47:00        4.507
2012-05-03 13:02:00        1.3 
....                       ....

主要约束是时间标签旁边的值始终有效,直到出现下一个时间标签。因此 timetag 2012-05-03 12:47:00 的值 5 在接下来的 13 分钟内有效,直到 13:00:00。 12:47:00 前 15 分钟的值为 (13*5+2*1.3)/15 = 4.507。在接下来的 15 分钟内,在 13:02:00 处,该值仅等于 1.3...(依此类推)

我来了这么久,最好先做一个 "artificial table",然后再与旧的 table 合并。我通过以下方式生成 table:

DECLARE @intStart datetime, @intEnd datetime


SELECT @intStart =min(OldTimetag), @intEnd = MAX(OldTimetag)
FROM OldTable
    where OldTimetag between '2012-05-03 12:47:00' and '2012-05-03 14:57:00'

Declare @ArtificalTable table (NewTimeTag datetime, NewValue Float)
Declare @MinuteSlicer Int
Set @MinuteSlicer = 15
Insert @Hallo Select @intStart, null

While ( @intStart < @intEnd ) BEGIN
    Insert @ArtificalTable
    Select DATEADD(mi,@MinuteSlicer, @intStart), Null 
        Set  @intStart = DATEADD(mi,@MinuteSlicer,@intStart)
        If @intEnd <= DATEADD(mi,@MinuteSlicer,@intStart) 
            Break
End 

这给了我这样的输出:

NewTimetag                 NewValue
2012-05-03 12:47:00        Null
2012-05-03 13:02:00        Null
....                       ....

但是,我在下一步如何正确加入 table 时遇到问题 - 谁能给我提示?

这是一种方法。

示例数据:

declare @data table(OldTimetag datetime2, OldValue numeric(5,2));
Insert into @data(OldTimetag, OldValue) Values
    ('2012-05-03 12:47:00', 5)
    , ('2012-05-03 13:00:00', 1.3)
    , ('2012-05-03 13:21:00', 7)
    , ('2012-05-03 14:56:00', 5)
    , ('2012-05-03 14:57:00', 0.3);

您的自定义范围大小(分钟):

declare @mins int = 15;

列表用于快速计算从 0 到 n 的有序列表,其中 n <= 第一个和最后一个之间的分钟数 OldTimetag

With list(n) as (
    Select top(Select 1+DATEDIFF(minute, min(OldTimetag), max(OldTimetag)) From @data) 
        ROW_NUMBER() over(order by (select 1))-1
    From (
        Select 1 From (values(1), (1), (1), (1), (1), (1), (1), (1), (1), (1)) as x1(n)
        Cross Join (values(1), (1), (1), (1), (1), (1), (1), (1), (1), (1)) as x2(n)
        Cross Join (values(1), (1), (1), (1), (1), (1), (1), (1), (1), (1)) as x3(n)
    ) as x(n)
)
Select NewTimetag = DATEADD(minute, @mins*(l.n/@mins), MIN(r.startTime)), NewValue = AVG(d.oldValue)
From list l
Cross Join (Select startTime = min(OldTimetag) From @data) as r
Cross Apply (Select maxTimetag = MAX(OldTimetag) From @data Where OldTimetag <= DATEADD(minute, n, startTime)) as mx
Inner Join @data d on d.OldTimetag = mx.maxTimetag
Group By l.n/@mins
  • Cross Join 用于将有序列表中的每个数字与数据中的第一个 OldTimetag 混合。
  • Cross Apply 用于在使用 Cross Join.
  • 创建的每分钟之前获取最近的 OldTimetag
  • Inner Join 然后将最近的 OldTimetag 与您的数据匹配,以便检索 oldValue.
  • Select 只需计算@mins 分钟内每个范围的平均值及其 NewTimetag.

在最短和最长 OldTimetag 之间长达 1000 分钟的范围内效果很好。如果您需要超出此限制,您可以在列表 CTE 中添加第 4 行:

Cross Join (values(1), (1), (1), (1), (1), (1), (1), (1), (1), (1)) as x4(n) => up to 10.000
Cross Join (values(1), (1), (1), (1), (1), (1), (1), (1), (1), (1)) as x5(n) => up to 100.000
...

一种方法是确定间隔(如果间隔至少包含一个时间戳,则生成一个间隔),用下一个时间戳增加时间 table,然后通过与时间间隔 table.

IF OBJECT_ID('tempdb..#values') IS NOT NULL DROP TABLE #values
CREATE TABLE #values (pk int identity, time datetime, value numeric(10,4))
INSERT INTO #values VALUES ('2012-05-03 12:47:00', 5)
INSERT INTO #values VALUES ('2012-05-03 13:00:00', 1.3)
INSERT INTO #values VALUES ('2012-05-03 13:21:00', 7)
INSERT INTO #values VALUES ('2012-05-03 14:56:00', 5)
INSERT INTO #values VALUES ('2012-05-03 14:57:00', 0.3)

DECLARE @timeSpanMinutes int SET @timeSpanMinutes=15
DECLARE @startTime datetime, @endTtime datetime
SELECT @startTime=MIN(time) FROM #values
SELECT @endTtime =DATEADD(MINUTE,(DATEDIFF(MINUTE,@startTime,MAX(time))
    /@timeSpanMinutes+1)*@timeSpanMinutes, @startTime) FROM #values -- MAX(time) multiple
SELECT intervals.start
  , SUM(value*(DATEDIFF(MINUTE -- minutes in intersection of [start,end] and [time,next]
     , CASE WHEN time<start THEN start ELSE time END -- Maximum(time,start)
     , CASE WHEN next<DATEADD(MINUTE,@timeSpanMinutes,intervals.start) THEN next 
            ELSE DATEADD(MINUTE,@timeSpanMinutes,intervals.start) END -- Minimum(next,end)
     )*1.0/@timeSpanMinutes)) as average
  FROM
  (SELECT DISTINCT DATEADD(MINUTE, (DATEDIFF(MINUTE,@startTime,time)
        /@timeSpanMinutes)*@timeSpanMinutes, @startTime) AS start 
      FROM #values -- round start to multiple of @timeSpanMinutes
    UNION SELECT DISTINCT DATEADD(MINUTE,@timeSpanMinutes+(DATEDIFF(MINUTE,@startTime,time)
        /@timeSpanMinutes)*@timeSpanMinutes, @startTime) 
      FROM #values -- union distinct with same as above but shifted with @timeSpanMinutes
  ) intervals -- intervals start time (end is calculated as start + @timeSpanMinutes)
  INNER JOIN 
  (SELECT v.*,ISNULL((SELECT MIN(time) FROM #values WHERE time>v.time),@endTtime) as next 
       FROM #values v -- add next column to #values
  ) vals
  ON vals.next>=intervals.start and vals.time<=DATEADD(MINUTE,@timeSpanMinutes,start)
  WHERE intervals.start<>@endTtime
  GROUP BY intervals.start
  ORDER BY intervals.start