在分区中使用 order by 时,Postgres window 函数是否执行隐式过滤?
Does Postgres window function perform an implicit filtering when using order by in the partitions?
我想知道 postgres 中发生了什么,请考虑这个现成的运行 片段
select id,
value,
array_agg(id) over (order by value asc) as "array_agg(id) with order",
array_agg(id) over () as "array_agg(id) without order"
from
(
values
(1, 1000),
(2, 2000)
) as rows (id, value)
order by id asc
您可能注意到我正在使用 window 函数从 window 帧中获取聚合。在“array_agg(id) with order”和“array_agg(id) without order”这两个投影中,没有过滤。所以我想知道,为什么我对分区进行排序的那一列实际上给我的印象是在排序时只有一个 id 而在未排序时有两个 id,你在这里放了什么动词?对我来说,这将是过滤。让我更加偏执的奇怪事情是,当我使用像“lead”这样的非聚合函数时,window 框架在两个分区中看起来是一样的,请自便:
select id,
value,
lead(id) over (order by value asc) as "lead(id) with order",
lead(id) over () as "lead(id) without order",
array_agg(id) over (order by value asc) as "array_agg(id) with order",
array_agg(id) over () as "array_agg(id) without order"
from
(
values
(1, 1000),
(2, 2000)
) as rows (id, value)
order by id asc
我阅读了 the official doc about Window Functions,对此一无所知。如果有人可以解释 milanesa 背后的真实情况,我将非常感激。
喜欢the documentation 说:
The <em><strong>frame_clause</strong></em>
specifies the set of rows constituting the window frame, which is a subset of the current partition, for those window functions that act on the frame instead of the whole partition.
现在 array_agg
作用于框架(它聚合框架中的所有行),而 lead
不作用。文档继续解释:
The default framing option is RANGE UNBOUNDED PRECEDING
, which is the same as RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
. With ORDER BY
, this sets the frame to be all rows from the partition start up through the current row's last ORDER BY
peer. Without ORDER BY
, this means all rows of the partition are included in the window frame, since all rows become peers of the current row.
因此 ORDER BY
的存在改变了 window 的默认框架的含义。
您忽略了教程中的这一段:
By default, if ORDER BY is supplied then the frame consists of all rows from the start of the partition up through the current row, plus any following rows that are equal to the current row according to the ORDER BY clause.
为了克服这个问题,使用这个结构:
select id,
value,
array_agg(id) over (order by value asc rows between unbounded preceding and unbounded following) as "array_agg(id) with order",
array_agg(id) over () as "array_agg(id) without order"
from
(
values
(1, 1000),
(2, 2000)
) as rows (id, value)
order by id asc;
我想知道 postgres 中发生了什么,请考虑这个现成的运行 片段
select id,
value,
array_agg(id) over (order by value asc) as "array_agg(id) with order",
array_agg(id) over () as "array_agg(id) without order"
from
(
values
(1, 1000),
(2, 2000)
) as rows (id, value)
order by id asc
您可能注意到我正在使用 window 函数从 window 帧中获取聚合。在“array_agg(id) with order”和“array_agg(id) without order”这两个投影中,没有过滤。所以我想知道,为什么我对分区进行排序的那一列实际上给我的印象是在排序时只有一个 id 而在未排序时有两个 id,你在这里放了什么动词?对我来说,这将是过滤。让我更加偏执的奇怪事情是,当我使用像“lead”这样的非聚合函数时,window 框架在两个分区中看起来是一样的,请自便:
select id,
value,
lead(id) over (order by value asc) as "lead(id) with order",
lead(id) over () as "lead(id) without order",
array_agg(id) over (order by value asc) as "array_agg(id) with order",
array_agg(id) over () as "array_agg(id) without order"
from
(
values
(1, 1000),
(2, 2000)
) as rows (id, value)
order by id asc
我阅读了 the official doc about Window Functions,对此一无所知。如果有人可以解释 milanesa 背后的真实情况,我将非常感激。
喜欢the documentation 说:
The
<em><strong>frame_clause</strong></em>
specifies the set of rows constituting the window frame, which is a subset of the current partition, for those window functions that act on the frame instead of the whole partition.
现在 array_agg
作用于框架(它聚合框架中的所有行),而 lead
不作用。文档继续解释:
The default framing option is
RANGE UNBOUNDED PRECEDING
, which is the same asRANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
. WithORDER BY
, this sets the frame to be all rows from the partition start up through the current row's lastORDER BY
peer. WithoutORDER BY
, this means all rows of the partition are included in the window frame, since all rows become peers of the current row.
因此 ORDER BY
的存在改变了 window 的默认框架的含义。
您忽略了教程中的这一段:
By default, if ORDER BY is supplied then the frame consists of all rows from the start of the partition up through the current row, plus any following rows that are equal to the current row according to the ORDER BY clause.
为了克服这个问题,使用这个结构:
select id,
value,
array_agg(id) over (order by value asc rows between unbounded preceding and unbounded following) as "array_agg(id) with order",
array_agg(id) over () as "array_agg(id) without order"
from
(
values
(1, 1000),
(2, 2000)
) as rows (id, value)
order by id asc;