为什么在查询中添加 order by 会改变聚合值?
why adding order by in the query changes the aggregate value?
的示例
CREATE TABLE employees(emp_no INT, dept_no INT);
INSERT INTO employees VALUES(1, 10);
INSERT INTO employees VALUES(2, 30);
INSERT INTO employees VALUES(3, 30);
INSERT INTO employees VALUES(4, 10);
INSERT INTO employees VALUES(5, 30);
INSERT INTO employees VALUES(6, 20);
INSERT INTO employees VALUES(7, 20);
INSERT INTO employees VALUES(8, 20);
INSERT INTO employees VALUES(9, 20);
INSERT INTO employees VALUES(10, 20);
INSERT INTO employees VALUES(11, 20);
COMMIT;
如果我 运行 这个查询没有 order by
,我得到所有行的相同计数值
dbadmin@b006bc38a718(*)=>
select
emp_no
, dept_not
, count(*) over (partition by dept_not) as emp_count
from employees;
emp_no | dept_not | emp_count
--------+----------+-----------
6 | 20 | 6
7 | 20 | 6
8 | 20 | 6
9 | 20 | 6
10 | 20 | 6
11 | 20 | 6
1 | 10 | 2
4 | 10 | 2
2 | 30 | 3
3 | 30 | 3
5 | 30 | 3
(11 rows)
但是如果我添加 order by
,我会得到增量值
dbadmin@b006bc38a718(*)=>
select
emp_no
, dept_not
, count(*) over (partition by dept_not order by emp_no) as emp_count
from employees;
emp_no | dept_not | emp_count
--------+----------+-----------
2 | 30 | 1
3 | 30 | 2
5 | 30 | 3
1 | 10 | 1
4 | 10 | 2
6 | 20 | 1
7 | 20 | 2
8 | 20 | 3
9 | 20 | 4
10 | 20 | 5
11 | 20 | 6
(11 rows)
Time: First fetch (11 rows): 85.075 ms. All rows formatted: 85.139 ms
order by
有什么影响?为什么我会获得增量价值?
如果 window 子句只包含 PARTITION BY
,它 returns 分区的总和 - 对于分区的每一行相同的值。
如果 window 子句同时包含 PARTITION BY
和 ORDER BY
,它 returns 运行 在分区内计数 。因此,使用 ORDER BY
表达式,分区中到目前为止已计算了多少行。
这正是 window 函数的工作原理。他们给你一个充满可能性的世界......
发生这种情况是因为 Vertica 应用默认值 frame-clause
,定义为:
RANGE UNBOUNDED PRECEDING AND CURRENT ROW
因此,为了获得您想要的结果,您可能需要在 OVER()
子句中的 ORDER BY
之后添加如下框架子句:
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
此行为记录为:
If the OVER clause omits specifying a window frame, the function creates a default window that extends from the current row to the first row in the current partition.
CREATE TABLE employees(emp_no INT, dept_no INT);
INSERT INTO employees VALUES(1, 10);
INSERT INTO employees VALUES(2, 30);
INSERT INTO employees VALUES(3, 30);
INSERT INTO employees VALUES(4, 10);
INSERT INTO employees VALUES(5, 30);
INSERT INTO employees VALUES(6, 20);
INSERT INTO employees VALUES(7, 20);
INSERT INTO employees VALUES(8, 20);
INSERT INTO employees VALUES(9, 20);
INSERT INTO employees VALUES(10, 20);
INSERT INTO employees VALUES(11, 20);
COMMIT;
如果我 运行 这个查询没有 order by
,我得到所有行的相同计数值
dbadmin@b006bc38a718(*)=>
select
emp_no
, dept_not
, count(*) over (partition by dept_not) as emp_count
from employees;
emp_no | dept_not | emp_count
--------+----------+-----------
6 | 20 | 6
7 | 20 | 6
8 | 20 | 6
9 | 20 | 6
10 | 20 | 6
11 | 20 | 6
1 | 10 | 2
4 | 10 | 2
2 | 30 | 3
3 | 30 | 3
5 | 30 | 3
(11 rows)
但是如果我添加 order by
,我会得到增量值
dbadmin@b006bc38a718(*)=>
select
emp_no
, dept_not
, count(*) over (partition by dept_not order by emp_no) as emp_count
from employees;
emp_no | dept_not | emp_count
--------+----------+-----------
2 | 30 | 1
3 | 30 | 2
5 | 30 | 3
1 | 10 | 1
4 | 10 | 2
6 | 20 | 1
7 | 20 | 2
8 | 20 | 3
9 | 20 | 4
10 | 20 | 5
11 | 20 | 6
(11 rows)
Time: First fetch (11 rows): 85.075 ms. All rows formatted: 85.139 ms
order by
有什么影响?为什么我会获得增量价值?
如果 window 子句只包含 PARTITION BY
,它 returns 分区的总和 - 对于分区的每一行相同的值。
如果 window 子句同时包含 PARTITION BY
和 ORDER BY
,它 returns 运行 在分区内计数 。因此,使用 ORDER BY
表达式,分区中到目前为止已计算了多少行。
这正是 window 函数的工作原理。他们给你一个充满可能性的世界......
发生这种情况是因为 Vertica 应用默认值 frame-clause
,定义为:
RANGE UNBOUNDED PRECEDING AND CURRENT ROW
因此,为了获得您想要的结果,您可能需要在 OVER()
子句中的 ORDER BY
之后添加如下框架子句:
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
此行为记录为:
If the OVER clause omits specifying a window frame, the function creates a default window that extends from the current row to the first row in the current partition.