SELECT 子句中多个 set-returning 函数的预期行为是什么?
What is the expected behaviour for multiple set-returning functions in SELECT clause?
我试图用两个 set-returning 函数的结果得到一个 "cross join",但在某些情况下我没有得到 "cross join",请参见示例
行为1:当集合长度相同时,从每个集合中逐项匹配
postgres=# SELECT generate_series(1,3), generate_series(5,7) order by 1,2;
generate_series | generate_series
-----------------+-----------------
1 | 5
2 | 6
3 | 7
(3 rows)
行为 2:当设置长度不同时,"cross join"设置
postgres=# SELECT generate_series(1,2), generate_series(5,7) order by 1,2;
generate_series | generate_series
-----------------+-----------------
1 | 5
1 | 6
1 | 7
2 | 5
2 | 6
2 | 7
(6 rows)
我想我不理解这里的某些东西,有人可以解释一下加速行为吗?
另一个更奇怪的例子:
postgres=# SELECT generate_series(1,2) x, generate_series(1,4) y order by x,y;
x | y
---+---
1 | 1
1 | 3
2 | 2
2 | 4
(4 rows)
我正在寻找标题中问题的答案,最好是 link(s) 文档。
我找不到这方面的任何文档。但是,我可以描述我观察到的行为。
集合生成函数每个 return 有限 行数。 Postgres 似乎 运行 集生成函数,直到 所有 它们都在最后一行——或者,更有可能在所有函数都回到第一行时停止。从技术上讲,这将是系列长度的最小公倍数 (LCM)。
我不确定为什么会这样。而且,正如我在评论中所说,我认为通常将函数放在 from
子句中会更好。
the documentation 中只有关于此问题的说明。我不确定这是否解释了所描述的行为。也许更重要的是这种函数用法已被弃用:
Currently, functions returning sets can also be called in the select list of a query. For each row that the query generates by itself, the function returning set is invoked, and an output row is generated for each element of the function's result set. Note, however, that this capability is deprecated and might be removed in future releases.
Postgres 10 或更新版本
为较小的集合添加空值。演示 generate_series()
:
SELECT generate_series( 1, 2) AS row2
, generate_series(11, 13) AS row3
, generate_series(21, 24) AS row4;
row2 | row3 | row4
-----+------+-----
1 | 11 | 21
2 | 12 | 22
null | 13 | 23
null | null | 24
dbfiddle here
If there is more than one set-returning function in the query's select
list, the behavior is similar to what you get from putting the
functions into a single LATERAL ROWS FROM( ... )
FROM
-clause item. For
each row from the underlying query, there is an output row using the
first result from each function, then an output row using the second
result, and so on. If some of the set-returning functions produce
fewer outputs than others, null values are substituted for the missing
data, so that the total number of rows emitted for one underlying row
is the same as for the set-returning function that produced the most
outputs. Thus the set-returning functions run “in lockstep” until they
are all exhausted, and then execution continues with the next
underlying row.
这结束了传统上奇怪的行为。
Postgres 9.6 或更早版本
结果行数(有点出人意料!)是所有集合中相同SELECT
的最小公倍数 ] 列表。 (只有在所有集合大小都没有公约数的情况下才像 CROSS JOIN
一样工作!)演示:
SELECT generate_series( 1, 2) AS row2
, generate_series(11, 13) AS row3
, generate_series(21, 24) AS row4;
row2 | row3 | row4
-----+------+-----
1 | 11 | 21
2 | 12 | 22
1 | 13 | 23
2 | 11 | 24
1 | 12 | 21
2 | 13 | 22
1 | 11 | 23
2 | 12 | 24
1 | 13 | 21
2 | 11 | 22
1 | 12 | 23
2 | 13 | 24
dbfiddle here
记录在 manual for Postgres 9.6 the chapter SQL Functions Returning Sets 中,以及避免它的建议:
Note: The key problem with using set-returning functions in the select
list, rather than the FROM
clause, is that putting more than one
set-returning function in the same select list does not behave very
sensibly. (What you actually get if you do so is a number of output
rows equal to the least common multiple of the numbers of rows
produced by each set-returning function.) The LATERAL
syntax produces
less surprising results when calling multiple set-returning functions,
and should usually be used instead.
大胆强调我的。
单个集合返回函数是可以的(但在 FROM
列表中更清晰),但现在不鼓励在同一个 SELECT
列表中使用多个。在我们进行 LATERAL
联接之前,这是一个有用的功能。现在只是历史遗迹。
相关:
- Parallel unnest() and sort order in PostgreSQL
我试图用两个 set-returning 函数的结果得到一个 "cross join",但在某些情况下我没有得到 "cross join",请参见示例
行为1:当集合长度相同时,从每个集合中逐项匹配
postgres=# SELECT generate_series(1,3), generate_series(5,7) order by 1,2; generate_series | generate_series -----------------+----------------- 1 | 5 2 | 6 3 | 7 (3 rows)
行为 2:当设置长度不同时,"cross join"设置
postgres=# SELECT generate_series(1,2), generate_series(5,7) order by 1,2; generate_series | generate_series -----------------+----------------- 1 | 5 1 | 6 1 | 7 2 | 5 2 | 6 2 | 7 (6 rows)
我想我不理解这里的某些东西,有人可以解释一下加速行为吗?
另一个更奇怪的例子:
postgres=# SELECT generate_series(1,2) x, generate_series(1,4) y order by x,y; x | y ---+--- 1 | 1 1 | 3 2 | 2 2 | 4 (4 rows)
我正在寻找标题中问题的答案,最好是 link(s) 文档。
我找不到这方面的任何文档。但是,我可以描述我观察到的行为。
集合生成函数每个 return 有限 行数。 Postgres 似乎 运行 集生成函数,直到 所有 它们都在最后一行——或者,更有可能在所有函数都回到第一行时停止。从技术上讲,这将是系列长度的最小公倍数 (LCM)。
我不确定为什么会这样。而且,正如我在评论中所说,我认为通常将函数放在 from
子句中会更好。
the documentation 中只有关于此问题的说明。我不确定这是否解释了所描述的行为。也许更重要的是这种函数用法已被弃用:
Currently, functions returning sets can also be called in the select list of a query. For each row that the query generates by itself, the function returning set is invoked, and an output row is generated for each element of the function's result set. Note, however, that this capability is deprecated and might be removed in future releases.
Postgres 10 或更新版本
为较小的集合添加空值。演示 generate_series()
:
SELECT generate_series( 1, 2) AS row2
, generate_series(11, 13) AS row3
, generate_series(21, 24) AS row4;
row2 | row3 | row4 -----+------+----- 1 | 11 | 21 2 | 12 | 22 null | 13 | 23 null | null | 24
dbfiddle here
If there is more than one set-returning function in the query's select list, the behavior is similar to what you get from putting the functions into a single
LATERAL ROWS FROM( ... )
FROM
-clause item. For each row from the underlying query, there is an output row using the first result from each function, then an output row using the second result, and so on. If some of the set-returning functions produce fewer outputs than others, null values are substituted for the missing data, so that the total number of rows emitted for one underlying row is the same as for the set-returning function that produced the most outputs. Thus the set-returning functions run “in lockstep” until they are all exhausted, and then execution continues with the next underlying row.
这结束了传统上奇怪的行为。
Postgres 9.6 或更早版本
结果行数(有点出人意料!)是所有集合中相同SELECT
的最小公倍数 ] 列表。 (只有在所有集合大小都没有公约数的情况下才像 CROSS JOIN
一样工作!)演示:
SELECT generate_series( 1, 2) AS row2
, generate_series(11, 13) AS row3
, generate_series(21, 24) AS row4;
row2 | row3 | row4 -----+------+----- 1 | 11 | 21 2 | 12 | 22 1 | 13 | 23 2 | 11 | 24 1 | 12 | 21 2 | 13 | 22 1 | 11 | 23 2 | 12 | 24 1 | 13 | 21 2 | 11 | 22 1 | 12 | 23 2 | 13 | 24
dbfiddle here
记录在 manual for Postgres 9.6 the chapter SQL Functions Returning Sets 中,以及避免它的建议:
Note: The key problem with using set-returning functions in the select list, rather than the
FROM
clause, is that putting more than one set-returning function in the same select list does not behave very sensibly. (What you actually get if you do so is a number of output rows equal to the least common multiple of the numbers of rows produced by each set-returning function.) TheLATERAL
syntax produces less surprising results when calling multiple set-returning functions, and should usually be used instead.
大胆强调我的。
单个集合返回函数是可以的(但在 FROM
列表中更清晰),但现在不鼓励在同一个 SELECT
列表中使用多个。在我们进行 LATERAL
联接之前,这是一个有用的功能。现在只是历史遗迹。
相关:
- Parallel unnest() and sort order in PostgreSQL