Postgres EXPLAIN ANALYZE 总时间似乎超过部分总和

Postgres EXPLAIN ANALYZE Total Time Appears to Exceed Sum of Parts

我正在尝试确定我的 Postgres 查询中的一些性能瓶颈,并 运行 一个 EXPLAIN ANALYZE 查询以获得一些见解。查询分析的输出如下:

    Nested Loop  (cost=162.35..5361.39 rows=4385 width=33) (actual time=5.663..315.153 rows=2 loops=1)
  ->  Seq Scan on "User" p=u  (cost=0.00..1.02 rows=1 width=8) (actual time=0.009..0.011 rows=1 loops=1)
        Filter: ("ID" = 1)
        Rows Removed by Filter: 1
  ->  Nested Loop  (cost=162.35..5316.51 rows=4385 width=33) (actual time=5.646..315.130 rows=2 loops=1)
        ->  Nested Loop  (cost=161.93..1854.59 rows=6574 width=33) (actual time=5.567..106.350 rows=44342 loops=1)
              ->  Seq Scan on "Op" o  (cost=0.00..1.34 rows=1 width=8) (actual time=0.007..0.011 rows=1 loops=1)
                    Filter: ("Name" = 'write'::text)
                    Rows Removed by Filter: 26
              ->  Bitmap Heap Scan on "Account" a  (cost=161.93..1768.73 rows=8453 width=33) (actual time=5.551..45.435 rows=44342 loops=1)
                    Recheck Cond: ("OId" = o."ID")
                    Filter: ("UId" = 1)
                    Heap Blocks: exact=1480
                    ->  Bitmap Index Scan on "IX_Account_op_ID"  (cost=0.00..159.82 rows=8453 width=0) (actual time=5.138..5.139 rows=44342 loops=1)
                          Index Cond: ("OId" = o."ID")
        ->  Index Scan using "PK_Resource_ID" on "Resources" r  (cost=0.42..0.53 rows=1 width=8) (actual time=0.003..0.003 rows=0 loops=44342)
              Index Cond: ("ID" = a."ResourceId")
              Filter: ("Role" = ANY ('{r1,r2,r3,r4,r5}'::text[]))
              Rows Removed by Filter: 1
Planning Time: 0.777 ms
Execution Time: 315.220 ms

我之前没有用 pg explain 做过太多的查询分析,所以我仍在努力理解它在这里告诉我的一切。我看到的一件事让我有些困惑,我可以看到外部执行:

Nested Loop  (cost=162.35..5361.39 rows=4385 width=33) (actual time=5.663..315.153 rows=2 loops=1)

说实际时间是 5.663...315 - 好的,这是有道理的,因为总执行时间是 315。然后,稍微低于这个时间:

嵌套循环(成本=162.35..5316.51 行=4385 宽度=33)(实际时间=5.646..315.130 行=2 循环=1)

好的!这告诉我总执行时间的大部分将在本节中。在这个下面我看到:

    ->  Nested Loop  (cost=161.93..1854.59 rows=6574 width=33) (actual time=5.567..106.350 rows=44342 loops=1)
          ->  Seq Scan on "Op" o  (cost=0.00..1.34 rows=1 width=8) (actual time=0.007..0.011 rows=1 loops=1)
                Filter: ("Name" = 'write'::text)
                Rows Removed by Filter: 26
          ->  Bitmap Heap Scan on "Account" a  (cost=161.93..1768.73 rows=8453 width=33) (actual time=5.551..45.435 rows=44342 loops=1)
                Recheck Cond: ("OId" = o."ID")
                Filter: ("UId" = 1)
                Heap Blocks: exact=1480

嗯 - 所以这是说有一个嵌套循环采用 106.35 ms ,它循环一次,并且该循环的内容是一个采用 .011ms 和位图堆的 sq 扫描扫描耗时 45.435ms.

所有循环都有 1 个循环,但在我看来,在这两种情况下,这些循环的总执行次数高于循环内容的总和。在内部循环的情况下,它需要 106ms 来执行,但如果我将该循环的内容加起来,它看起来应该只需要 45.446ms (.011ms + 45.435ms)。在外循环的情况下,它花费了 315.13ms,但如果我把内容加起来,它看起来应该是 106.353ms (106.35ms + .003ms)

我正在看这个并假设 loops=1 意味着它只执行一次循环内的内容.. 尽管总时间表明执行了不止一次。我不确定我在哪里误解了这一点。任何人都可以为我阐明这一点吗?如有任何建议,将不胜感激!

ANALYZE 没有显示在 SELECT tasks FROM [..] 中执行任务需要多长时间。 这里 pg_sleep(1) 占用了 20000 毫秒(20x1 秒),但您只能在 Nested Loop 部分看到它

explain (analyze, verbose)
select *, pg_sleep(1)
  from generate_series(1,5) AS w
  cross join generate_series(current_date,current_date+3,interval '1 day') AS v;

QUERY PLAN
Nested Loop  (cost=0.02..22510.02 rows=1000000 width=16) (actual time=1001.082..20021.134 rows=20 loops=1)
  Output: w.w, v.v, pg_sleep('1'::double precision)
  ->  Function Scan on pg_catalog.generate_series w  (cost=0.00..10.00 rows=1000 width=4) (actual time=0.006..0.008 rows=5 loops=1)
        Output: w.w
        Function Call: generate_series(1, 5)
  ->  Function Scan on pg_catalog.generate_series v  (cost=0.02..10.02 rows=1000 width=8) (actual time=0.005..0.010 rows=4 loops=5)
        Output: v.v
        Function Call: generate_series((('now'::cstring)::date)::timestamp with time zone, ((('now'::cstring)::date + 3))::timestamp with time zone, '1 day'::interval)
Planning time: 0.093 ms
Execution time: 20021.178 ms

它没有显示 JOIN 数据集需要多长时间。这里获取数据大约需要 1000 毫秒,但是加入这些集合需要 4 倍的时间,因为你看到 Nested Loop 部分的时间增加了大约那么多,这是唯一必须做的事情,没有在详细计划中列出。

explain (analyze, verbose)
select *
  from generate_series(1,50000) AS w
  cross join generate_series(current_date,current_date+30,interval '1 day') AS v;

QUERY PLAN
Nested Loop  (cost=0.02..20010.02 rows=1000000 width=12) (actual time=3.237..3128.123 rows=1550000 loops=1)
  Output: w.w, v.v
  ->  Function Scan on pg_catalog.generate_series w  (cost=0.00..10.00 rows=1000 width=4) (actual time=3.210..35.472 rows=50000 loops=1)
        Output: w.w
        Function Call: generate_series(1, 50000)
  ->  Function Scan on pg_catalog.generate_series v  (cost=0.02..10.02 rows=1000 width=8) (actual time=0.001..0.021 rows=31 loops=50000)
        Output: v.v
        Function Call: generate_series((('now'::cstring)::date)::timestamp with time zone, ((('now'::cstring)::date + 30))::timestamp with time zone, '1 day'::interval)
Planning time: 0.046 ms
Execution time: 4103.113 ms