Nested Loop Left Join 花费太多时间？

Question

这是query:

EXPLAIN (analyze, BUFFERS, SETTINGS)
SELECT
    operation.id
FROM
    operation
RIGHT JOIN(
    SELECT uid, did FROM (
            SELECT uid, did FROM operation where id = 993754
        ) t
    ) parts ON (operation.uid = parts.uid AND operation.did = parts.did)

和EXPLAIN信息：

Nested Loop Left Join  (cost=0.85..29695.77 rows=100 width=8) (actual time=13.709..13.711 rows=1 loops=1)
  Buffers: shared hit=4905
  ->  Unique  (cost=0.42..8.45 rows=1 width=16) (actual time=0.011..0.013 rows=1 loops=1)
        Buffers: shared hit=5
        ->  Index Only Scan using oi on operation operation_1  (cost=0.42..8.44 rows=1 width=16) (actual time=0.011..0.011 rows=1 loops=1)
              Index Cond: (id = 993754)
              Heap Fetches: 1
              Buffers: shared hit=5
  ->  Index Only Scan using oi on operation  (cost=0.42..29686.32 rows=100 width=24) (actual time=13.695..13.696 rows=1 loops=1)
        Index Cond: ((uid = operation_1.uid) AND (did = operation_1.did))
        Heap Fetches: 1
        Buffers: shared hit=4900
Settings: max_parallel_workers_per_gather = '4', min_parallel_index_scan_size = '0', min_parallel_table_scan_size = '0', parallel_setup_cost = '0', parallel_tuple_cost = '0', work_mem = '256MB'
Planning Time: 0.084 ms
Execution Time: 13.728 ms

为什么 Nested Loop 花费的时间比孩子花费的总和还多？我能为此做什么？ Execution Time 应该小于 1 毫秒吧？

更新：

Nested Loop Left Join  (cost=5.88..400.63 rows=101 width=8) (actual time=0.012..0.012 rows=1 loops=1)
  Buffers: shared hit=8
  ->  Index Scan using oi on operation operation_1  (cost=0.42..8.44 rows=1 width=16) (actual time=0.005..0.005 rows=1 loops=1)
        Index Cond: (id = 993754)
        Buffers: shared hit=4
  ->  Bitmap Heap Scan on operation  (cost=5.45..391.19 rows=100 width=24) (actual time=0.004..0.005 rows=1 loops=1)
        Recheck Cond: ((uid = operation_1.uid) AND (did = operation_1.did))
        Heap Blocks: exact=1
        Buffers: shared hit=4
        ->  Bitmap Index Scan on ou  (cost=0.00..5.42 rows=100 width=0) (actual time=0.003..0.003 rows=1 loops=1)
              Index Cond: ((uid = operation_1.uid) AND (did = operation_1.did))
              Buffers: shared hit=3
Settings: max_parallel_workers_per_gather = '4', min_parallel_index_scan_size = '0', min_parallel_table_scan_size = '0', parallel_setup_cost = '0', parallel_tuple_cost = '0', work_mem = '256MB'
Planning Time: 0.127 ms
Execution Time: 0.028 ms

谢谢大家，当我将索引拆分为btree(id)和btree(uid, did)时，一切都很完美，但是什么导致它们不能一起使用？有什么细节或规则吗？

顺便说一句，sql是用来Real-Time计算的，还有一些Window函数代码没有在这里显示。

Answer 1

Why does Nested Loop cost more and more time than sum of childs cost?

根据您的示例，事实并非如此。你能详细说明是什么让你认为它起作用吗？

总之，访问4900页来获取1个元组似乎很奢侈。我猜你的 table 吸尘器不够干净。

虽然现在我更喜欢 Florian 的建议，"uid" 和 "did" 不是索引的前导列，这就是它慢的原因。它基本上是在进行全索引扫描，使用索引作为 table 的精简版。遗憾的是，以这种方式使用索引时，EXPLAIN 输出并不清楚，而不是传统的 "jump to a specific part of the index"

所以你缺少索引。

Answer 2

Nested Loop其实用不了多少时间。 13.709..13.711 的实际时间意味着在第一行准备好从该节点发出之前花费了 13.709 毫秒，并且在完成之前花费了 0.002 毫秒。

请注意，13.709 毫秒的启动成本包括其两个子节点的成本。在嵌套循环开始之前，两个子节点都需要至少发出一行。

Unique 子项在 0.011 毫秒后开始发出第一行（也是唯一一行）。然而，Index Only Scan 子节点在 13.695 毫秒后才开始发出它的第一行（也是唯一一行）。这意味着你实际花费的大部分时间都在这个 Index Only Scan.

有一个很好的答案 here，它深入解释了成本和实际时间。

https://explain.depesz.com which calculates an inclusive and exclusive time for each node. Here 还有一个很好的工具，它用于您的查询计划，清楚地表明大部分时间花在了 Index Only Scan。

由于查询几乎所有时间都花在这个仅索引扫描上，因此优化将获得最大收益。为 operation table 上的 uid 和 did 列创建单独的索引应该会大大缩短查询时间。

CREATE INDEX operation_uid_did ON operation(uid, did);

当前执行计划包含 2 个仅索引扫描。

一个慢的：

  ->  Index Only Scan using oi on operation  (cost=0.42..29686.32 rows=100 width=24) (actual time=13.695..13.696 rows=1 loops=1)
        Index Cond: ((uid = operation_1.uid) AND (did = operation_1.did))
        Heap Fetches: 1
        Buffers: shared hit=4900

而且速度快：

  ->  Index Only Scan using oi on operation operation_1  (cost=0.42..8.44 rows=1 width=16) (actual time=0.011..0.011 rows=1 loops=1)
        Index Cond: (id = 993754)
        Heap Fetches: 1
        Buffers: shared hit=5

两者都使用索引oi，但索引条件不同。请注意使用 id 作为索引条件的快速者只需要加载 5 页数据 (Buffers: shared hit=5)。慢的需要加载 4900 页（Buffers: shared hit=4900）。这表明索引已优化以查询 id，但不是查询 uid 和 did。索引 oi 可能按此顺序涵盖所有 3 列 id, uid, did。

多列btree索引只有在查询最左边的列有约束时才能有效使用。 The official documentation about multi-column indexes 对此进行了深入的解释。

Nested Loop Left Join 花费太多时间？

Nested Loop Left Join cost too much time?

postgresql

performance

explain