为什么 ORDER BY 在 Presto 查询中不起作用?

Why is ORDER BY not working in a Presto query?

我有一个相当简单的 Presto 查询,它不是按我指定的列排序:

(SELECT 
  tag_monitor_domains.property_name,
  count(*) as HourCount
FROM pageviews
  INNER JOIN tag_monitor_domains 
  ON pageviews.property_id = CAST(tag_monitor_domains.property_id AS varchar) 
WHERE FROM_UNIXTIME(pageviews.time) > date_add('month', -1, CURRENT_DATE)
AND FROM_UNIXTIME(pageviews.time) < date_add('hour', -0, CURRENT_TIMESTAMP)
GROUP BY 1
ORDER BY 1 DESC)

但结果未按 property_name 排序,行是随机的。

感谢@DShultz 的报告。确实是这样,我为此举报了https://github.com/trinodb/trino/issues/6008。让我们继续讨论这是期望的还是错误的行为。

作为一种变通方法....好吧,删除括号。但这你已经知道了。

为什么是这种情况的更普遍的原因——Presto 忽略 ORDER BY,因为它不改变查询的语义(例如,在子查询中),作为管理根据 SQL 规范。有关详细信息,请参阅 https://trino.io/blog/2019/06/03/redundant-order-by.html

SQL 规范定义了以下句法规则:

<query expression> ::=
  [ <with clause> ]
  <query expression body>
  [ <order by clause> ]
  [ <result offset clause> ]
  [ <fetch first clause> ]

<query expression body> ::=
    <query term>
  | <query expression body> UNION [ ALL | DISTINCT ] [ <corresponding spec> ] <query term>
  | <query expression body> EXCEPT [ ALL | DISTINCT ] [ <corresponding spec> ] <query term>

<query term> ::=
    <query primary>
  | <query term> INTERSECT [ ALL | DISTINCT ] [ <corresponding spec> ] <query primary>

<query primary> ::=
    <simple table>
  | <left paren>
       <query expression body>
       [ <order by clause> ]
       [ <result offset clause> ]
       [ <fetch first clause> ]
    <right paren>

带括号的查询是一个 <query expression>,它只包含一个 <query primary> 的形状 <left paren> <query expression body> ... <right paren>

此外,它指定:

a) If QE does not immediately contain an <order by clause>, then the ordering of rows in T is implementation-dependent.

(量化宽松是<query expression>)

因此,由于 <query expression> 在括号 <query primary> 的情况下不会立即包含 ORDER BY 子句,因此不能保证顺序。

Presto 优化了这种情况并发出警告,表明您可能无法获得预期的结果:

presto> (SELECT x FROM (VALUES 1) t(x) ORDER BY x);
 x
---
 1
(1 row)

WARNING: ORDER BY in subquery may have no effect

要获得所需的顺序,您需要确保 ORDER BY 子句位于顶层,方法是按照其他回复中的建议删除括号,或者将其移到括号外:

(SELECT ...)
ORDER BY 1 DESC