为什么在查询中添加 order by 会改变聚合值？

Question

遵循 vertica 来自 https://www.vertica.com/docs/11.0.x/HTML/Content/Authoring/AnalyzingData/SQLAnalytics/AnalyticFunctionsVersusAggregateFunctions.htm?tocpath=Analyzing%20Data%7CSQL%20Analytics%7C_____2

的示例

CREATE TABLE employees(emp_no INT, dept_no INT);
INSERT INTO employees VALUES(1, 10);
INSERT INTO employees VALUES(2, 30);
INSERT INTO employees VALUES(3, 30);
INSERT INTO employees VALUES(4, 10);
INSERT INTO employees VALUES(5, 30);
INSERT INTO employees VALUES(6, 20);
INSERT INTO employees VALUES(7, 20);
INSERT INTO employees VALUES(8, 20);
INSERT INTO employees VALUES(9, 20);
INSERT INTO employees VALUES(10, 20);
INSERT INTO employees VALUES(11, 20);
COMMIT;

如果我运行这个查询没有 order by，我得到所有行的相同计数值

dbadmin@b006bc38a718(*)=> 
select 
  emp_no
, dept_not
, count(*) over (partition by dept_not) as emp_count 
from employees;

 emp_no | dept_not | emp_count
--------+----------+-----------
      6 |       20 |         6
      7 |       20 |         6
      8 |       20 |         6
      9 |       20 |         6
     10 |       20 |         6
     11 |       20 |         6
      1 |       10 |         2
      4 |       10 |         2
      2 |       30 |         3
      3 |       30 |         3
      5 |       30 |         3
(11 rows)

但是如果我添加 order by，我会得到增量值

dbadmin@b006bc38a718(*)=> 
select 
  emp_no
, dept_not
, count(*) over (partition by dept_not order by emp_no) as emp_count 
from employees;

 emp_no | dept_not | emp_count
--------+----------+-----------
      2 |       30 |         1
      3 |       30 |         2
      5 |       30 |         3
      1 |       10 |         1
      4 |       10 |         2
      6 |       20 |         1
      7 |       20 |         2
      8 |       20 |         3
      9 |       20 |         4
     10 |       20 |         5
     11 |       20 |         6
(11 rows)

Time: First fetch (11 rows): 85.075 ms. All rows formatted: 85.139 ms

order by 有什么影响？为什么我会获得增量价值？

Answer 1

如果 window 子句只包含 PARTITION BY，它 returns 分区的总和 - 对于分区的每一行相同的值。

如果 window 子句同时包含 PARTITION BY 和 ORDER BY，它 returns 运行在分区内计数 。因此，使用 ORDER BY 表达式，分区中到目前为止已计算了多少行。

这正是 window 函数的工作原理。他们给你一个充满可能性的世界......

Answer 2

发生这种情况是因为 Vertica 应用默认值 frame-clause，定义为：

RANGE UNBOUNDED PRECEDING AND CURRENT ROW

因此，为了获得您想要的结果，您可能需要在 OVER() 子句中的 ORDER BY 之后添加如下框架子句：

ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING

此行为记录为：

If the OVER clause omits specifying a window frame, the function creates a default window that extends from the current row to the first row in the current partition.

Link to doc

为什么在查询中添加 order by 会改变聚合值？

why adding order by in the query changes the aggregate value?

vertica