EXPLAIN ANALYZE中Postgres实际循环的解读

Interpretation of Postgres Actual Loops in EXPLAIN ANALYZE

阅读 Postgres EXPLAIN ANALYZE 计划时,文档 states 某些 条件下,运算符可能会执行多次(如 Actual Loops).在这些情况下,一些性能测量显示为 per-loop,而不是整个运算符的聚合(例如 Actual Total TimeActual Rows):

In some query plans, it is possible for a subplan node to be executed more than once. For example, the inner index scan will be executed once per outer row in the above nested-loop plan. In such cases, the loops value reports the total number of executions of the node, and the actual time and rows values shown are averages per-execution. This is done to make the numbers comparable with the way that the cost estimates are shown. Multiply by the loops value to get the total time actually spent in the node. In the above example, we spent a total of 0.220 milliseconds executing the index scans on tenk2.

但是,我发现了多个查询计划实例,其中 sub-node 的 Actual Total Time 乘以循环次数后变得大于根节点的总时间。在这些情况下,总时间似乎反映了实际总时间,而不是 per-loop 平均值(下面的示例)。

那么,在什么条件下需要将Actual RowsActual Total Time的值乘以Actual Loops才能得到正确的结果,而在哪些条件是值已经正确聚合?

示例:

查询explain analyze select count(*) from name n join person_info pi on n.id = pi.person_id;

生成以下计划:

    QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=233201.06..233201.07 rows=1 width=8) (actual time=1770.430..1823.586 rows=1 loops=1)
   ->  Gather  (cost=233200.84..233201.05 rows=2 width=8) (actual time=1769.845..1823.580 rows=3 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         ->  Partial Aggregate  (cost=232200.84..232200.85 rows=1 width=8) (actual time=1768.782..1768.813 rows=1 loops=3)
               ->  Parallel Hash Join  (cost=145078.16..227887.97 rows=1725150 width=0) (actual time=1234.256..1611.255 rows=1376736 loops=3)
                     Hash Cond: (pi.person_id = n.id)
                     ->  Parallel Index Only Scan using person_id_person_info on person_info pi  (cost=0.43..78281.72 rows=1725150 width=4) (actual time=0.086..195.010 rows=1376736 loops=3)
                           Heap Fetches: 0
                     ->  Parallel Hash  (cost=111851.77..111851.77 rows=2658077 width=4) (actual time=782.084..782.085 rows=2126580 loops=3)
                           Buckets: 131072  Batches: 128  Memory Usage: 3040kB
                           ->  Parallel Seq Scan on name n  (cost=0.00..111851.77 rows=2658077 width=4) (actual time=0.018..345.728 rows=2126580 loops=3)
 Planning Time: 0.133 ms
 Execution Time: 1823.752 ms
(14 rows)

最终聚合的总运行时间为 1823.586 ms。但是,最后一个并行哈希操作的运行时间为 782.085 ms,有 3 个循环。根据文档,这将导致 782.085 * 3 ms = 2346.255 ms 的运行时间大于根节点的总运行时间。由于时间总是包括他们的 children,所以这不可能。

我无法给出详尽的答案,但经验法则如下:

  • 以下仅适用于下方Gather的节点,因为它们是并行化的

  • 行计数、“过滤器删除的行”、堆获取和类似的 计数 除以进程数(“循环”),因此您必须将它们相乘才能得到总数

  • 执行时间和缓冲区计数划分

在当时,这是显而易见的:如果三个并行进程工作一秒钟来计算一个结果,那么它们只需要一秒钟就可以完成,而不是三秒钟。这就是并行查询的全部意义!