SELECT 子句中多个 set-returning 函数的预期行为是什么?

What is the expected behaviour for multiple set-returning functions in SELECT clause?

我试图用两个 set-returning 函数的结果得到一个 "cross join",但在某些情况下我没有得到 "cross join",请参见示例

行为1:当集合长度相同时,从每个集合中逐项匹配

postgres=# SELECT generate_series(1,3), generate_series(5,7) order by 1,2;
 generate_series | generate_series 
-----------------+-----------------
               1 |               5
               2 |               6
               3 |               7
(3 rows)

行为 2:当设置长度不同时,"cross join"设置

postgres=# SELECT generate_series(1,2), generate_series(5,7) order by 1,2;
 generate_series | generate_series 
-----------------+-----------------
               1 |               5
               1 |               6
               1 |               7
               2 |               5
               2 |               6
               2 |               7
(6 rows)

我想我不理解这里的某些东西,有人可以解释一下加速行为吗?

另一个更奇怪的例子:

postgres=# SELECT generate_series(1,2) x, generate_series(1,4) y order by x,y;
 x | y 
---+---
 1 | 1
 1 | 3
 2 | 2
 2 | 4
(4 rows)

我正在寻找标题中问题的答案,最好是 link(s) 文档。

我找不到这方面的任何文档。但是,我可以描述我观察到的行为。

集合生成函数每个 return 有限 行数。 Postgres 似乎 运行 集生成函数,直到 所有 它们都在最后一行——或者,更有可能在所有函数都回到第一行时停止。从技术上讲,这将是系列长度的最小公倍数 (LCM)。

我不确定为什么会这样。而且,正如我在评论中所说,我认为通常将函数放在 from 子句中会更好。

the documentation 中只有关于此问题的说明。我不确定这是否解释了所描述的行为。也许更重要的是这种函数用法已被弃用:

Currently, functions returning sets can also be called in the select list of a query. For each row that the query generates by itself, the function returning set is invoked, and an output row is generated for each element of the function's result set. Note, however, that this capability is deprecated and might be removed in future releases.

Postgres 10 或更新版本

为较小的集合添加空值。演示 generate_series():

SELECT generate_series( 1,  2) AS row2
     , generate_series(11, 13) AS row3
     , generate_series(21, 24) AS row4;
row2 | row3 | row4
-----+------+-----
   1 |   11 |   21
   2 |   12 |   22
null |   13 |   23
null | null |   24

dbfiddle here

The manual for Postgres 10:

If there is more than one set-returning function in the query's select list, the behavior is similar to what you get from putting the functions into a single LATERAL ROWS FROM( ... ) FROM-clause item. For each row from the underlying query, there is an output row using the first result from each function, then an output row using the second result, and so on. If some of the set-returning functions produce fewer outputs than others, null values are substituted for the missing data, so that the total number of rows emitted for one underlying row is the same as for the set-returning function that produced the most outputs. Thus the set-returning functions run “in lockstep” until they are all exhausted, and then execution continues with the next underlying row.

这结束了传统上奇怪的行为。

Postgres 9.6 或更早版本

结果行数(有点出人意料!)是所有集合中相同SELECT的最小公倍数 ] 列表。 (只有在所有集合大小都没有公约数的情况下才像 CROSS JOIN 一样工作!)演示:

SELECT generate_series( 1,  2) AS row2
     , generate_series(11, 13) AS row3
     , generate_series(21, 24) AS row4;
row2 | row3 | row4
-----+------+-----
   1 |   11 |   21
   2 |   12 |   22
   1 |   13 |   23
   2 |   11 |   24
   1 |   12 |   21
   2 |   13 |   22
   1 |   11 |   23
   2 |   12 |   24
   1 |   13 |   21
   2 |   11 |   22
   1 |   12 |   23
   2 |   13 |   24

dbfiddle here

记录在 manual for Postgres 9.6 the chapter SQL Functions Returning Sets 中,以及避免它的建议:

Note: The key problem with using set-returning functions in the select list, rather than the FROM clause, is that putting more than one set-returning function in the same select list does not behave very sensibly. (What you actually get if you do so is a number of output rows equal to the least common multiple of the numbers of rows produced by each set-returning function.) The LATERAL syntax produces less surprising results when calling multiple set-returning functions, and should usually be used instead.

大胆强调我的。

单个集合返回函数是可以的(但在 FROM 列表中更清晰),但现在不鼓励在同一个 SELECT 列表中使用多个。在我们进行 LATERAL 联接之前,这是一个有用的功能。现在只是历史遗迹。

相关:

  • Parallel unnest() and sort order in PostgreSQL