为什么计划者会针对具有不同波动率的函数得出不同的结果？

Question

这个问题是的跟进和结果。我应该注意，我不认为这是重复的，因为那个问题是在寻求特定问题的解决方案。我在这里询问有关一般行为的更多信息，并演示如何重现它。（为了证明差异，您可以在我们讨论行为的已接受答案上看到相当长的评论线程，我觉得它离题了，尤其是考虑到长度。）

我有一个功能。这是一个展示感兴趣行为的示例：

CREATE OR REPLACE FUNCTION test(INT)
  RETURNS TABLE(num INT, letter TEXT)
  VOLATILE
  LANGUAGE SQL
  AS $$
  SELECT *
  FROM (VALUES (1,'a'),(2,'b'),(3,'c'),(4,'d'),(5,'e')) x
  LIMIT 
  $$;

当我运行这个 EXPLAIN:

EXPLAIN ANALYZE SELECT * FROM test(10);

我在 psql 中得到了这个结果（我在其中删除了一个巨大的 "Query Plan" header）：

 Function Scan on test  (cost=0.25..10.25 rows=1000 width=36) (actual time=0.125..0.136 rows=5 loops=1)
 Total runtime: 0.179 ms
(2 rows)

记下行估计。它估计有 1000 行。

但是，如果我将函数更改为 STABLE 或 IMMUTABLE:

CREATE OR REPLACE FUNCTION test(INT)
  RETURNS TABLE(num INT, letter TEXT)
  STABLE
  LANGUAGE SQL
  AS $$
  SELECT *
  FROM (VALUES (1,'a'),(2,'b'),(3,'c'),(4,'d'),(5,'e')) x
  LIMIT 
  $$;

然后同样的EXPLAIN给了我不同的方案：

 Limit  (cost=0.00..0.06 rows=5 width=36) (actual time=0.010..0.050 rows=5 loops=1)
   ->  Values Scan on "*VALUES*"  (cost=0.00..0.06 rows=5 width=36) (actual time=0.005..0.018 rows=5 loops=1)
 Total runtime: 0.087 ms
(3 rows)

现在它正确地估计了 5 行，并且它显示了函数中包含的查询的计划。成本要高一个数量级。运行时间也减少了。（查询很短，可能不是特别重要。）

鉴于链接的问题处理更多的数据并且具有非常显着的性能差异，看起来规划器实际上根据函数是 VOLATILE 还是 [=16 来做一些不同的事情=]/IMMUTABLE.

planner 在这里究竟在做什么，我在哪里可以阅读有关它的一些文档？

这些测试在 PG 9.3 中运行。

Answer 1

估计有 1000 行

1000 估计行数是 CREATE FUNCTION:

中记录的默认值

execution_cost

A positive number giving the estimated execution cost for the function, in units of cpu_operator_cost. If the function returns a set, this is the cost per returned row. If the cost is not specified, 1 unit is assumed for C-language and internal functions, and 100 units for functions in all other languages. Larger values cause the planner to try to avoid evaluating the function more often than necessary.

result_rows

A positive number giving the estimated number of rows that the planner should expect the function to return. This is only allowed when the function is declared to return a set. The default assumption is 1000 rows.

当函数被声明为 volatile 时，它要求不被内联，因此 result_rows 的默认值成立。

另一方面，当它像在您的第二次测试中那样在查询中内联时，将估计行数，就好像函数体已移入查询并且函数声明没有'存在。由于可以直接评估 VALUES 子句，因此在第二个测试中会得出准确的估计值。

规划器到底在做什么，我在哪里可以阅读相关文档？

一般来说，规划器的优化策略在主文档中是没有解释的。它们在邮件列表中进行了讨论，并在源代码评论中提到，幸运的是，它们往往非常清晰且编写得很好（与一般源代码相比）。在函数内联的情况下，我相信 inline_set_returning_functions and inline_set_returning_function 的评论揭示了驱动此特定优化的大部分规则。（警告：以上链接指向当前的 master 分支，随时可能更改或漂移）。

为什么计划者会针对具有不同波动率的函数得出不同的结果？

Why is the planner coming up with different results for functions with different volatilities?

postgresql

function

volatile

sql-execution-plan