为什么在查询中添加 order by 会改变聚合值?

why adding order by in the query changes the aggregate value?

遵循 vertica 来自 https://www.vertica.com/docs/11.0.x/HTML/Content/Authoring/AnalyzingData/SQLAnalytics/AnalyticFunctionsVersusAggregateFunctions.htm?tocpath=Analyzing%20Data%7CSQL%20Analytics%7C_____2

的示例
CREATE TABLE employees(emp_no INT, dept_no INT);
INSERT INTO employees VALUES(1, 10);
INSERT INTO employees VALUES(2, 30);
INSERT INTO employees VALUES(3, 30);
INSERT INTO employees VALUES(4, 10);
INSERT INTO employees VALUES(5, 30);
INSERT INTO employees VALUES(6, 20);
INSERT INTO employees VALUES(7, 20);
INSERT INTO employees VALUES(8, 20);
INSERT INTO employees VALUES(9, 20);
INSERT INTO employees VALUES(10, 20);
INSERT INTO employees VALUES(11, 20);
COMMIT;

如果我 运行 这个查询没有 order by,我得到所有行的相同计数值

dbadmin@b006bc38a718(*)=> 
select 
  emp_no
, dept_not
, count(*) over (partition by dept_not) as emp_count 
from employees;
 emp_no | dept_not | emp_count
--------+----------+-----------
      6 |       20 |         6
      7 |       20 |         6
      8 |       20 |         6
      9 |       20 |         6
     10 |       20 |         6
     11 |       20 |         6
      1 |       10 |         2
      4 |       10 |         2
      2 |       30 |         3
      3 |       30 |         3
      5 |       30 |         3
(11 rows)

但是如果我添加 order by,我会得到增量值

dbadmin@b006bc38a718(*)=> 
select 
  emp_no
, dept_not
, count(*) over (partition by dept_not order by emp_no) as emp_count 
from employees;
 emp_no | dept_not | emp_count
--------+----------+-----------
      2 |       30 |         1
      3 |       30 |         2
      5 |       30 |         3
      1 |       10 |         1
      4 |       10 |         2
      6 |       20 |         1
      7 |       20 |         2
      8 |       20 |         3
      9 |       20 |         4
     10 |       20 |         5
     11 |       20 |         6
(11 rows)

Time: First fetch (11 rows): 85.075 ms. All rows formatted: 85.139 ms

order by 有什么影响?为什么我会获得增量价值?

如果 window 子句只包含 PARTITION BY,它 returns 分区的总和 - 对于分区的每一行相同的值。

如果 window 子句同时包含 PARTITION BYORDER BY,它 returns 运行 在分区内计数 。因此,使用 ORDER BY 表达式,分区中到目前为止已计算了多少行。

这正是 window 函数的工作原理。他们给你一个充满可能性的世界......

发生这种情况是因为 Vertica 应用默认值 frame-clause,定义为:

RANGE UNBOUNDED PRECEDING AND CURRENT ROW

因此,为了获得您想要的结果,您可能需要在 OVER() 子句中的 ORDER BY 之后添加如下框架子句:

ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING

此行为记录为:

If the OVER clause omits specifying a window frame, the function creates a default window that extends from the current row to the first row in the current partition.

Link to doc