SQL 聚合函数别名
SQL aggregate function alias
我是 SQL 的初学者,这是我被要求解决的问题:
Say that a big city is defined as a place
of type city
with a population of at
least 100,000. Write an SQL query that returns the scheme (state_name,no_big_city,big_city_population)
ordered by state_name
, listing those states which have either (a) at least five big cities or (b) at least one million people living in big cities. The column state_name
is the name
of the state
, no_big_city
is the number of big cities in the state, and big_city_population
is the number of people living in big cities in the state.
现在,据我所知,以下查询 returns 正确结果:
SELECT state.name AS state_name
, COUNT(CASE WHEN place.type = 'city' AND place.population >= 100000 THEN 1 ELSE NULL END) AS no_big_city
, SUM(CASE WHEN place.type = 'city' AND place.population >= 100000 THEN place.population ELSE NULL END) AS big_city_population
FROM state
JOIN place
ON state.code = place.state_code
GROUP BY state_name
HAVING
COUNT(CASE WHEN place.type = 'city' AND place.population >= 100000 THEN 1 ELSE NULL END) >= 5 OR
SUM(CASE WHEN place.type = 'city' AND place.population >= 100000 THEN place.population ELSE NULL END) >= 1000000
ORDER BY state_name;
但是,代码中使用的两个聚合函数出现了两次。我的问题:是否有任何方法可以使此代码重复消失并保留功能?
需要说明的是,我已经尝试使用别名,但我只是收到 "column does not exist" 错误。
不确定这是评论还是答案,因为它更多的是基于偏好而不是技术,但无论如何我都会post
当我需要引用计算列(通常同时引用很多列)时,我通常会做的是将我的计算列放在派生的 table 中,然后使用它在外部的别名引用计算列派生 table。这个语法应该是 ANSI-SQL 正确的,但是我对 PostGRES
不熟悉
select * from (
SELECT STATE.NAME AS state_name
,COUNT(CASE WHEN place.type = 'city'
AND place.population >= 100000 THEN 1 ELSE NULL END) AS no_big_city
,SUM(CASE WHEN place.type = 'city'
AND place.population >= 100000 THEN place.population ELSE NULL END) AS big_city_population
FROM STATE
INNER JOIN place
ON STATE.code = place.state_code
GROUP BY state_name
) sub
where no_big_city >= 5
and big_city_population >=100000
--HAVING COUNT(CASE WHEN place.type = 'city'
-- AND place.population >= 100000 THEN 1 ELSE NULL END) >= 5
-- OR SUM(CASE WHEN place.type = 'city'
-- AND place.population >= 100000 THEN place.population ELSE NULL END) >= 1000000
ORDER BY state_name;
这种方法的好处是,虽然您是通过 subquery/derived table 添加复杂功能,但公式保存在一个地方,因此任何更改只需发生一次。我不知道这是否会比简单地在 group-by 中重复计算更糟糕,但我无法想象它会更糟糕。
SELECT 子句是您想要从 WHERE 子句 table(s).
过滤的 select
GROUP BY 是 SELECT 中聚合函数中如何对过滤记录进行分组的条件。所以别名不能在那里。
但是您可以将过滤后的记录和 select 包装起来。类似的东西:
SELECT state_name, no_big_city, big_city_population
FROM
(
SELECT
state.name AS state_name,
COUNT(1) no_big_city,
MAX(place.population) max_city_population,
SUM(place.population) AS big_city_population
FROM state JOIN place ON state.code = place.state_code
WHERE
place.type = 'city' AND
place.population >= 100000
GROUP BY state.name
)
WHERE
no_big_city >= 5 OR
max_city_population > 1000000
ORDER BY state_name
还有搬家条件
place.type = 'city' AND
place.population >= 100000
从 CASE 到 WHERE 的表现会更好。 "No city" or "small city records 将不会被处理。特别是如果 place.type 列上有索引。
An output column's name can be used to refer to the column's value in
ORDER BY
and GROUP BY
clauses, but not in the WHERE
or HAVING
clauses;
there you must write out the expression instead.
大胆强调我的。
您可以避免使用子查询或 CTE 重复输入长表达式:
SELECT state_name, no_big_city, big_city_population
FROM (
SELECT s.name AS state_name
, COUNT(*) FILTER (WHERE p.type = 'city' AND p.population >= 100000) AS no_big_city
, SUM(population) FILTER (WHERE p.type = 'city' AND p.population >= 100000) AS big_city_population
FROM state s
JOIN place p ON s.code = p.state_code
GROUP BY s.name -- can be input column name as well, best schema-qualified to avoid ambiguity
) sub
WHERE no_big_city >= 5
OR big_city_population >= 1000000
ORDER BY state_name;
同时,我使用聚合 FILTER
子句(Postgres 9.4+)进行了简化:
- How can I simplify this game statistics query?
但是,我建议从这个更简单、更快速的查询开始:
SELECT s.state_name, p.no_big_city, p.big_city_population
FROM state s
JOIN (
SELECT state_code AS code -- alias just to simplify join
, count(*) AS no_big_city
, sum(population) AS big_city_population
FROM place
WHERE type = 'city'
AND population >= 100000
GROUP BY 1 -- can be ordinal number referencing position in SELECT list
HAVING count(*) >= 5 OR sum(population) >= 1000000 -- simple expressions now
) p USING (code)
ORDER BY 1; -- can also be ordinal number
我正在演示另一个选项来引用 GROUP BY
和 ORDER BY
中的表达式。只有在不影响可读性和可维护性的情况下才使用它。
我是 SQL 的初学者,这是我被要求解决的问题:
Say that a big city is defined as a
place
of typecity
with a population of at least 100,000. Write an SQL query that returns the scheme(state_name,no_big_city,big_city_population)
ordered bystate_name
, listing those states which have either (a) at least five big cities or (b) at least one million people living in big cities. The columnstate_name
is thename
of thestate
,no_big_city
is the number of big cities in the state, andbig_city_population
is the number of people living in big cities in the state.
现在,据我所知,以下查询 returns 正确结果:
SELECT state.name AS state_name
, COUNT(CASE WHEN place.type = 'city' AND place.population >= 100000 THEN 1 ELSE NULL END) AS no_big_city
, SUM(CASE WHEN place.type = 'city' AND place.population >= 100000 THEN place.population ELSE NULL END) AS big_city_population
FROM state
JOIN place
ON state.code = place.state_code
GROUP BY state_name
HAVING
COUNT(CASE WHEN place.type = 'city' AND place.population >= 100000 THEN 1 ELSE NULL END) >= 5 OR
SUM(CASE WHEN place.type = 'city' AND place.population >= 100000 THEN place.population ELSE NULL END) >= 1000000
ORDER BY state_name;
但是,代码中使用的两个聚合函数出现了两次。我的问题:是否有任何方法可以使此代码重复消失并保留功能?
需要说明的是,我已经尝试使用别名,但我只是收到 "column does not exist" 错误。
不确定这是评论还是答案,因为它更多的是基于偏好而不是技术,但无论如何我都会post
当我需要引用计算列(通常同时引用很多列)时,我通常会做的是将我的计算列放在派生的 table 中,然后使用它在外部的别名引用计算列派生 table。这个语法应该是 ANSI-SQL 正确的,但是我对 PostGRES
不熟悉select * from (
SELECT STATE.NAME AS state_name
,COUNT(CASE WHEN place.type = 'city'
AND place.population >= 100000 THEN 1 ELSE NULL END) AS no_big_city
,SUM(CASE WHEN place.type = 'city'
AND place.population >= 100000 THEN place.population ELSE NULL END) AS big_city_population
FROM STATE
INNER JOIN place
ON STATE.code = place.state_code
GROUP BY state_name
) sub
where no_big_city >= 5
and big_city_population >=100000
--HAVING COUNT(CASE WHEN place.type = 'city'
-- AND place.population >= 100000 THEN 1 ELSE NULL END) >= 5
-- OR SUM(CASE WHEN place.type = 'city'
-- AND place.population >= 100000 THEN place.population ELSE NULL END) >= 1000000
ORDER BY state_name;
这种方法的好处是,虽然您是通过 subquery/derived table 添加复杂功能,但公式保存在一个地方,因此任何更改只需发生一次。我不知道这是否会比简单地在 group-by 中重复计算更糟糕,但我无法想象它会更糟糕。
SELECT 子句是您想要从 WHERE 子句 table(s).
过滤的 select
GROUP BY 是 SELECT 中聚合函数中如何对过滤记录进行分组的条件。所以别名不能在那里。
但是您可以将过滤后的记录和 select 包装起来。类似的东西:
SELECT state_name, no_big_city, big_city_population
FROM
(
SELECT
state.name AS state_name,
COUNT(1) no_big_city,
MAX(place.population) max_city_population,
SUM(place.population) AS big_city_population
FROM state JOIN place ON state.code = place.state_code
WHERE
place.type = 'city' AND
place.population >= 100000
GROUP BY state.name
)
WHERE
no_big_city >= 5 OR
max_city_population > 1000000
ORDER BY state_name
还有搬家条件
place.type = 'city' AND
place.population >= 100000
从 CASE 到 WHERE 的表现会更好。 "No city" or "small city records 将不会被处理。特别是如果 place.type 列上有索引。
An output column's name can be used to refer to the column's value in
ORDER BY
andGROUP BY
clauses, but not in theWHERE
orHAVING
clauses; there you must write out the expression instead.
大胆强调我的。
您可以避免使用子查询或 CTE 重复输入长表达式:
SELECT state_name, no_big_city, big_city_population
FROM (
SELECT s.name AS state_name
, COUNT(*) FILTER (WHERE p.type = 'city' AND p.population >= 100000) AS no_big_city
, SUM(population) FILTER (WHERE p.type = 'city' AND p.population >= 100000) AS big_city_population
FROM state s
JOIN place p ON s.code = p.state_code
GROUP BY s.name -- can be input column name as well, best schema-qualified to avoid ambiguity
) sub
WHERE no_big_city >= 5
OR big_city_population >= 1000000
ORDER BY state_name;
同时,我使用聚合 FILTER
子句(Postgres 9.4+)进行了简化:
- How can I simplify this game statistics query?
但是,我建议从这个更简单、更快速的查询开始:
SELECT s.state_name, p.no_big_city, p.big_city_population
FROM state s
JOIN (
SELECT state_code AS code -- alias just to simplify join
, count(*) AS no_big_city
, sum(population) AS big_city_population
FROM place
WHERE type = 'city'
AND population >= 100000
GROUP BY 1 -- can be ordinal number referencing position in SELECT list
HAVING count(*) >= 5 OR sum(population) >= 1000000 -- simple expressions now
) p USING (code)
ORDER BY 1; -- can also be ordinal number
我正在演示另一个选项来引用 GROUP BY
和 ORDER BY
中的表达式。只有在不影响可读性和可维护性的情况下才使用它。