不存在非聚合时聚合列出现问题

Issue with aggregate columns when non-aggregates are not present

当我 运行 以下查询时,我遇到了 Amazon Redshift 聚合错误的问题:

select case when frequency between (avg(frequency) + stddev(frequency)) and (avg(frequency) - stddev(frequency)) then  round(avg(frequency) - stddev(frequency))||'-'||round(avg(frequency) + stddev(frequency))
       when frequency between (avg(frequency) + 2*stddev(frequency)) and (avg(frequency) - 2*stddev(frequency)) then  round(avg(frequency) - 2*stddev(frequency))||'-'||round(avg(frequency) + 2*stddev(frequency))
       when frequency between (avg(frequency) + 3*stddev(frequency)) and (avg(frequency) - 3*stddev(frequency)) then  round(avg(frequency) - 3*stddev(frequency))||'-'||round(avg(frequency) + 3*stddev(frequency))
          else null
           end as deviation 
from schema.table

;

错误提示我需要在 group by 子句中包含频率。如果我这样做,那么我会收到 "aggregates not allowed in group by"。有谁知道为什么会这样?我最初的猜测是这可能是数据类型的问题,但弄乱了这个并没有帮助。

谢谢!

这些查询可能会造成混淆,您可以在 sub-query 中单独获取聚合,然后通过 cross-join 在每一行上使用它们,或者您可以使用分析函数,这样您就可以获取没有 GROUP BY:

的聚合值
SELECT case when frequency between (avg_Freq + dev_Freq) and (avg_Freq - dev_Freq) then  round(avg_Freq - dev_Freq)||'-'||round(avg_Freq + dev_Freq)
       when frequency between (avg_Freq + 2*dev_Freq) and (avg_Freq - 2*dev_Freq) then  round(avg_Freq - 2*dev_Freq)||'-'||round(avg_Freq + 2*dev_Freq)
       when frequency between (avg_Freq + 3*dev_Freq) and (avg_Freq - 3*dev_Freq) then  round(avg_Freq - 3*dev_Freq)||'-'||round(avg_Freq + 3*dev_Freq)
          else null
           end as deviation 
FROM schema.table
CROSS JOIN (SELECT avg(frequency) AS avg_Freq
            ,stddev(frequency) AS dev_Freq
      FROM schema.table
      )sub

或者,您可以将 OVER() 添加到现有查询中的每个聚合:

select case when frequency between (avg(frequency) OVER() + stddev(frequency) OVER()) and (avg(frequency) OVER() - stddev(frequency) OVER()) then  round(avg(frequency) OVER() - stddev(frequency) OVER())||'-'||round(avg(frequency) OVER() + stddev(frequency) OVER())
       when frequency between (avg(frequency) OVER() + 2*stddev(frequency) OVER()) and (avg(frequency) OVER() - 2*stddev(frequency) OVER()) then  round(avg(frequency) OVER() - 2*stddev(frequency) OVER())||'-'||round(avg(frequency) OVER() + 2*stddev(frequency) OVER())
       when frequency between (avg(frequency) OVER() + 3*stddev(frequency) OVER()) and (avg(frequency) OVER() - 3*stddev(frequency) OVER()) then  round(avg(frequency) OVER() - 3*stddev(frequency) OVER())||'-'||round(avg(frequency) OVER() + 3*stddev(frequency) OVER())
          else null
           end as deviation 
from schema.table

不是 100% 使用 redshift 语法,但相信两者都应该有效。

您可以通过以下方式将其分解:

WITH
SELECT avg(frequency) as AVG, stddev(frequency) as STDDEV 
  from schema.table AS TEMP
,
SELECT case when frequency between TEMP.AVG and TEMP.STDDEV etc.

您必须检查确切的陈述。我是用脑子做的。