为什么 ORDER BY 在 Presto 查询中不起作用?
Why is ORDER BY not working in a Presto query?
我有一个相当简单的 Presto 查询,它不是按我指定的列排序:
(SELECT
tag_monitor_domains.property_name,
count(*) as HourCount
FROM pageviews
INNER JOIN tag_monitor_domains
ON pageviews.property_id = CAST(tag_monitor_domains.property_id AS varchar)
WHERE FROM_UNIXTIME(pageviews.time) > date_add('month', -1, CURRENT_DATE)
AND FROM_UNIXTIME(pageviews.time) < date_add('hour', -0, CURRENT_TIMESTAMP)
GROUP BY 1
ORDER BY 1 DESC)
但结果未按 property_name 排序,行是随机的。
感谢@DShultz 的报告。确实是这样,我为此举报了https://github.com/trinodb/trino/issues/6008。让我们继续讨论这是期望的还是错误的行为。
作为一种变通方法....好吧,删除括号。但这你已经知道了。
为什么是这种情况的更普遍的原因——Presto 忽略 ORDER BY,因为它不改变查询的语义(例如,在子查询中),作为管理根据 SQL 规范。有关详细信息,请参阅 https://trino.io/blog/2019/06/03/redundant-order-by.html。
SQL 规范定义了以下句法规则:
<query expression> ::=
[ <with clause> ]
<query expression body>
[ <order by clause> ]
[ <result offset clause> ]
[ <fetch first clause> ]
<query expression body> ::=
<query term>
| <query expression body> UNION [ ALL | DISTINCT ] [ <corresponding spec> ] <query term>
| <query expression body> EXCEPT [ ALL | DISTINCT ] [ <corresponding spec> ] <query term>
<query term> ::=
<query primary>
| <query term> INTERSECT [ ALL | DISTINCT ] [ <corresponding spec> ] <query primary>
<query primary> ::=
<simple table>
| <left paren>
<query expression body>
[ <order by clause> ]
[ <result offset clause> ]
[ <fetch first clause> ]
<right paren>
带括号的查询是一个 <query expression>
,它只包含一个 <query primary>
的形状 <left paren> <query expression body> ... <right paren>
此外,它指定:
a) If QE
does not immediately contain an <order by clause>
, then the ordering of rows in T
is implementation-dependent.
(量化宽松是<query expression>
)
因此,由于 <query expression>
在括号 <query primary>
的情况下不会立即包含 ORDER BY 子句,因此不能保证顺序。
Presto 优化了这种情况并发出警告,表明您可能无法获得预期的结果:
presto> (SELECT x FROM (VALUES 1) t(x) ORDER BY x);
x
---
1
(1 row)
WARNING: ORDER BY in subquery may have no effect
要获得所需的顺序,您需要确保 ORDER BY
子句位于顶层,方法是按照其他回复中的建议删除括号,或者将其移到括号外:
(SELECT ...)
ORDER BY 1 DESC
我有一个相当简单的 Presto 查询,它不是按我指定的列排序:
(SELECT
tag_monitor_domains.property_name,
count(*) as HourCount
FROM pageviews
INNER JOIN tag_monitor_domains
ON pageviews.property_id = CAST(tag_monitor_domains.property_id AS varchar)
WHERE FROM_UNIXTIME(pageviews.time) > date_add('month', -1, CURRENT_DATE)
AND FROM_UNIXTIME(pageviews.time) < date_add('hour', -0, CURRENT_TIMESTAMP)
GROUP BY 1
ORDER BY 1 DESC)
但结果未按 property_name 排序,行是随机的。
感谢@DShultz 的报告。确实是这样,我为此举报了https://github.com/trinodb/trino/issues/6008。让我们继续讨论这是期望的还是错误的行为。
作为一种变通方法....好吧,删除括号。但这你已经知道了。
为什么是这种情况的更普遍的原因——Presto 忽略 ORDER BY,因为它不改变查询的语义(例如,在子查询中),作为管理根据 SQL 规范。有关详细信息,请参阅 https://trino.io/blog/2019/06/03/redundant-order-by.html。
SQL 规范定义了以下句法规则:
<query expression> ::=
[ <with clause> ]
<query expression body>
[ <order by clause> ]
[ <result offset clause> ]
[ <fetch first clause> ]
<query expression body> ::=
<query term>
| <query expression body> UNION [ ALL | DISTINCT ] [ <corresponding spec> ] <query term>
| <query expression body> EXCEPT [ ALL | DISTINCT ] [ <corresponding spec> ] <query term>
<query term> ::=
<query primary>
| <query term> INTERSECT [ ALL | DISTINCT ] [ <corresponding spec> ] <query primary>
<query primary> ::=
<simple table>
| <left paren>
<query expression body>
[ <order by clause> ]
[ <result offset clause> ]
[ <fetch first clause> ]
<right paren>
带括号的查询是一个 <query expression>
,它只包含一个 <query primary>
的形状 <left paren> <query expression body> ... <right paren>
此外,它指定:
a) If
QE
does not immediately contain an<order by clause>
, then the ordering of rows inT
is implementation-dependent.
(量化宽松是<query expression>
)
因此,由于 <query expression>
在括号 <query primary>
的情况下不会立即包含 ORDER BY 子句,因此不能保证顺序。
Presto 优化了这种情况并发出警告,表明您可能无法获得预期的结果:
presto> (SELECT x FROM (VALUES 1) t(x) ORDER BY x);
x
---
1
(1 row)
WARNING: ORDER BY in subquery may have no effect
要获得所需的顺序,您需要确保 ORDER BY
子句位于顶层,方法是按照其他回复中的建议删除括号,或者将其移到括号外:
(SELECT ...)
ORDER BY 1 DESC