SQL 聚合函数别名

Question

我是 SQL 的初学者，这是我被要求解决的问题：

Say that a big city is defined as a place of type city with a population of at least 100,000. Write an SQL query that returns the scheme (state_name,no_big_city,big_city_population) ordered by state_name, listing those states which have either (a) at least five big cities or (b) at least one million people living in big cities. The column state_name is the name of the state, no_big_city is the number of big cities in the state, and big_city_population is the number of people living in big cities in the state.

现在，据我所知，以下查询 returns 正确结果：

SELECT state.name AS state_name
     , COUNT(CASE WHEN place.type = 'city' AND place.population >= 100000 THEN 1 ELSE NULL END) AS no_big_city
     , SUM(CASE WHEN place.type = 'city' AND place.population >= 100000 THEN place.population ELSE NULL END) AS big_city_population
FROM state
JOIN place
ON state.code = place.state_code
GROUP BY state_name
    HAVING
        COUNT(CASE WHEN place.type = 'city' AND place.population >= 100000 THEN 1 ELSE NULL END) >= 5 OR
        SUM(CASE WHEN place.type = 'city' AND place.population >= 100000 THEN place.population ELSE NULL END) >= 1000000
ORDER BY state_name;

但是，代码中使用的两个聚合函数出现了两次。我的问题：是否有任何方法可以使此代码重复消失并保留功能？

需要说明的是，我已经尝试使用别名，但我只是收到 "column does not exist" 错误。

Answer 1

不确定这是评论还是答案，因为它更多的是基于偏好而不是技术，但无论如何我都会post

当我需要引用计算列（通常同时引用很多列）时，我通常会做的是将我的计算列放在派生的 table 中，然后使用它在外部的别名引用计算列派生 table。这个语法应该是 ANSI-SQL 正确的，但是我对 PostGRES

不熟悉

select * from (

SELECT STATE.NAME AS state_name
    ,COUNT(CASE WHEN place.type = 'city'
                AND place.population >= 100000 THEN 1 ELSE NULL END) AS no_big_city
    ,SUM(CASE WHEN place.type = 'city'
                AND place.population >= 100000 THEN place.population ELSE NULL END) AS big_city_population
FROM STATE
INNER JOIN place
    ON STATE.code = place.state_code
    GROUP BY state_name
) sub 
    where no_big_city >= 5 
        and big_city_population >=100000

--HAVING COUNT(CASE WHEN place.type = 'city'
--          AND place.population >= 100000 THEN 1 ELSE NULL END) >= 5
--  OR SUM(CASE WHEN place.type = 'city'
--              AND place.population >= 100000 THEN place.population ELSE NULL END) >= 1000000
ORDER BY state_name;

这种方法的好处是，虽然您是通过 subquery/derived table 添加复杂功能，但公式保存在一个地方，因此任何更改只需发生一次。我不知道这是否会比简单地在 group-by 中重复计算更糟糕，但我无法想象它会更糟糕。

Answer 2

SELECT 子句是您想要从 WHERE 子句 table(s).
过滤的 select GROUP BY 是 SELECT 中聚合函数中如何对过滤记录进行分组的条件。所以别名不能在那里。但是您可以将过滤后的记录和 select 包装起来。类似的东西：

SELECT state_name, no_big_city, big_city_population 
FROM
 (
   SELECT 
     state.name AS state_name,     
     COUNT(1) no_big_city,
     MAX(place.population) max_city_population,
     SUM(place.population) AS big_city_population
   FROM state JOIN place ON state.code = place.state_code 
   WHERE   
     place.type = 'city' AND
     place.population >= 100000
   GROUP BY  state.name
  )
WHERE 
   no_big_city >= 5 OR
   max_city_population > 1000000
ORDER BY state_name

还有搬家条件

   place.type = 'city' AND
   place.population >= 100000

从 CASE 到 WHERE 的表现会更好。 "No city" or "small city records 将不会被处理。特别是如果 place.type 列上有索引。

Answer 3

The manual clarifies:

An output column's name can be used to refer to the column's value in ORDER BY and GROUP BY clauses, but not in the WHERE or HAVING clauses; there you must write out the expression instead.

大胆强调我的。

您可以避免使用子查询或 CTE 重复输入长表达式：

SELECT state_name, no_big_city, big_city_population
FROM  (
   SELECT s.name AS state_name
        , COUNT(*)        FILTER (WHERE p.type = 'city' AND p.population >= 100000) AS no_big_city
        , SUM(population) FILTER (WHERE p.type = 'city' AND p.population >= 100000) AS big_city_population
   FROM   state s
   JOIN   place p ON s.code = p.state_code
   GROUP  BY s.name -- can be input column name as well, best schema-qualified to avoid ambiguity
   ) sub
WHERE  no_big_city >= 5
   OR  big_city_population >= 1000000
ORDER  BY state_name;

同时，我使用聚合 FILTER 子句（Postgres 9.4+）进行了简化：

How can I simplify this game statistics query?

但是，我建议从这个更简单、更快速的查询开始：

SELECT s.state_name, p.no_big_city, p.big_city_population
FROM   state s
JOIN  (
   SELECT state_code      AS code  -- alias just to simplify join
        , count(*)        AS no_big_city
        , sum(population) AS big_city_population
   FROM   place
   WHERE  type = 'city'
   AND    population >= 100000
   GROUP  BY 1  -- can be ordinal number referencing position in SELECT list
   HAVING count(*) >= 5 OR sum(population) >= 1000000  -- simple expressions now
   ) p USING (code)
ORDER  BY 1;    -- can also be ordinal number

我正在演示另一个选项来引用 GROUP BY 和 ORDER BY 中的表达式。只有在不影响可读性和可维护性的情况下才使用它。

SQL 聚合函数别名

SQL aggregate function alias

sql

postgresql

case

aggregate-functions

having