Group By 中的 AWS Athena ALIAS 未得到解析

AWS Athena ALIAS in Group By does not get resolved

我在 Athena 中有一个非常基本的分组查询,我想在其中使用别名。可以通过将相同的引用放在 group by 中来使示例工作,但是当正在进行复杂的列修改并且需要在两个地方复制逻辑时,这并不是很方便。我过去也这样做过,现在我有一个声明不能通过复制来工作。

问题:

SELECT 
    substr(accountDescriptor, 5) as account, 
    sum(revenue) as grossRevenue 
FROM sales 
GROUP BY account

这将引发错误:

alias Column 'account' cannot be resolved

下面的作品,所以是关于别名处理的。

SELECT 
    substr(accountDescriptor, 5) as account, 
    sum(revenue) as grossRevenue 
FROM sales 
GROUP BY substr(accountDescriptor, 5)

Hive 不允许在 GROUP BY 中使用列别名——就像 SQL 标准不允许它们一样。一些数据库扩展 SQL 以允许使用别名,但这是一个扩展。

只需重复表达式:

SELECT substr(accountDescriptor, 5) as account, sum(revenue) as grossRevenue
FROM sales
GROUP BY substr(accountDescriptor, 5);

那是因为SQL是按一定顺序求值的,比如table扫描、过滤、聚合、投影、排序。您尝试使用投影的结果作为聚合的输入。在许多情况下,这是可能的(投影是微不足道的,就像你的情况一样),但这种行为没有在 ANSI SQL 中定义(Presto 和 Athena 遵循)。

我们发现它在很多情况下非常有用,因此将来可能会添加对此的支持(扩展 ANSI SQL)。

目前,有几种方法可以解决这个问题:

SELECT account, sum(revenue) as grossRevenue 
FROM (SELECT substr(accountDescriptor, 5) as account, revenue FROM sales)
GROUP BY account

WITH better_sales AS (SELECT substr(accountDescriptor, 5) as account, revenue FROM sales)
SELECT account, sum(revenue) as grossRevenue 
FROM better_sales
GROUP BY account

SELECT account, sum(revenue) as grossRevenue 
FROM sales
LATERAL JOIN (SELECT substr(accountDescriptor, 5) as account)
GROUP BY account

SELECT substr(accountDescriptor, 5) as account, sum(revenue) as grossRevenue
FROM sales
GROUP BY 1;

除了 kokosing and Gordon Linoff, you can use numbers that represent the location of the grouped column name in the SELECT statement. Such approach can also provide you with better performance as described in section 8 of this AWS Blog 的回答。例如:

SELECT
    substr(accountDescriptor, 5) as account,
    sum(revenue) as grossRevenue
FROM sales,
GROUP BY 1

注意:编号从一开始,而不是从零开始。

此处 1 有点类似于 account。主要的明显缺点是,如果您在 SELECT 中更改列的顺序,那么您还需要在 GROUP BY 中考虑到这一点:

SELECT
    sum(revenue) as grossRevenue,
    substr(accountDescriptor, 5) as account
FROM sales,
GROUP BY 2