SQL 移动聚合

SQL Moving Aggregation

问题:对于包含日期列、任意数量的类别列和值列的给定记录集,我想计算任意日期 window(例如 30 天)的值聚合、365 天等。我查看了 window 聚合函数、CTE 和其他一些函数,但它们似乎(至少对我而言)没有执行所需的功能。

下面的SQL (T-SQL) 代表了我试图完成的基本思想,但我对其可扩展性,特别是连接,以及增加的难度有不好的感觉一旦我尝试按其他名义组进行分组。

    SELECT 
        basedate
        , count(*) as [n]
        , sum(Value) as [SumValue]
        , avg(value) As [AverageValue]
        , stdev(value) As [StdevValue]
FROM
    (SELECT t1.basedate , t2.*
     FROM
        (SELECT DISTINCT dt as basedate from foo)as t1
         ,foo as t2
      WHERE datediff(d, t1.basedate, t2.dt) between -30 and 0
      ) t3
GROUP BY t3.basedate
ORDER BY t3.BASEDATE DESC

我创建了一个 SQLFiddle 试图让它更具体。

SQLFiddle

谢谢。

在我的简短测试中,如果 dt 字段被索引,这比您当前的查询更快:

SELECT    a.dt AS basedate
        , count(*) as [n]
        , sum(b.Value) as [SumValue]
        , avg(b.value) As [AverageValue]
        , stdev(b.value) As [StdevValue]
FROM  foo a
JOIN  foo b
   ON b.dt BETWEEN DATEADD(DAY,-30,a.dt) AND a.dt
GROUP BY a.dt
ORDER BY a.dt DESC

编辑:我询问了版本,因为在 SQL Server 2012+ 中支持 RANGE/ROWS,它可以创建一个移动的 window,就像你要去的那样因为,我相信你坚持自我加入。使用 DATEADD() 并比较 dt 值比您的 DATEDIFF() 版本稍快。

稍微尝试一下 SqlFiddle 中提供的设置,我找到了这两个可能的解决方案:(好吧,第一个只是解决方案的一半,不确定我将如何在其中添加 stdev()有效的方式)

WITH t1
  AS (SELECT DISTINCT dt as basedate from foo),
     sumcount
  AS (SELECT basedate,
             SUM((CASE WHEN datediff(d, t1.basedate, t2.dt) between -30 and 0 THEN 1 ELSE 0 END)) as [n],
             SUM((CASE WHEN datediff(d, t1.basedate, t2.dt) between -30 and 0 THEN value ELSE 0 END)) as [Sumvalue]
        FROM t1, foo t2
       GROUP BY basedate)
SELECT basedate,
       [n],
       [Sumvalue],
       [Sumvalue] / [n] as [Averagevalue]
  FROM sumcount
ORDER BY basedate DESC


GO

WITH t1
  AS (SELECT DISTINCT dt as basedate from foo),
     t2
  AS (SELECT basedate, min_date = DateAdd(day, -30, basedate), max_date = DateAdd(day, 0, basedate) from t1)

SELECT basedate,
          count(*) as [n]
        , sum(b.value) as [Sumvalue]
        , avg(b.value) As [Averagevalue]
        , stdev(b.value) As [Stdevvalue]
FROM  t2 
JOIN  foo b
   ON b.dt BETWEEN t2.min_date AND t2.max_date
GROUP BY basedate
ORDER BY basedate DESC

我更喜欢最后一个,因为它简单易读,巧合的是它的运行速度也快了很多,尽管我还不能完全说出原因。请注意,我将测试数据额外加载了 100 倍(使用 GO 100 的魔力),以便在我的笔记本电脑上获得更长的持续时间。 (很难比较 1ms 和 1ms =)

令人惊讶的是来自 Halt CO 的(接受的)解决方案 returns 不同的结果 在给定 'extended' 测试集时比原始查询(或 'my' 查询);你可能想调查一下! (原因是它多次找到基准日期,因此多次求和,最后得到更大的计数和求和值。我不确定这是你想要的,或者它是否是什么'real data' 可能会发生这种情况,但是由于您在该 foo table 上放置了索引而不是 UNIQUE 索引,所以我假设可能会出现双打......)