为什么两个查询的联合比单个联合查询更快?

Why is a union of two queries faster than a single one of the unioned queries?

我在 2x Google Cloud SQL Postgres 9.6 实例中使用 Autovacuuming 调试查询时间。暂存(无流量)7.5gb + 2vCPU。和生产:37.5gb 10vCPU。结果相同且令人困惑。

索引:

持续100-120ms:

SELECT * FROM "Trade" WHERE "user1" = 1
UNION
SELECT * FROM "Trade" WHERE "user2" = 1
LIMIT 24;
Limit  (cost=221.92..222.16 rows=24 width=1187) (actual time=0.115..0.124 rows=24 loops=1)
  ->  HashAggregate  (cost=221.92..222.46 rows=54 width=1187) (actual time=0.115..0.121 rows=24 loops=1)
        Group Key: id, status, user1, user2
        ->  Append  (cost=4.60..218.55 rows=54 width=1187) (actual time=0.024..0.076 rows=26 loops=1)
              ->  Bitmap Heap Scan on "Trade"  (cost=4.60..89.99 rows=22 width=155) (actual time=0.024..0.061 rows=23 loops=1)
                    Recheck Cond: (user1 = 1)
                    Heap Blocks: exact=20
                    ->  Bitmap Index Scan on trade_depositor_user_id  (cost=0.00..4.59 rows=22 width=0) (actual time=0.016..0.016 rows=23 loops=1)
                          Index Cond: (user1 = 1)
              ->  Bitmap Heap Scan on "Trade" "Trade_1"  (cost=4.67..128.02 rows=32 width=155) (actual time=0.011..0.014 rows=3 loops=1)
                    Recheck Cond: (user2 = 1)
                    Heap Blocks: exact=3
                    ->  Bitmap Index Scan on trade_withdrawer_user_id  (cost=0.00..4.67 rows=32 width=0) (actual time=0.009..0.009 rows=3 loops=1)
                          Index Cond: (user2 = 1)
Planning time: 0.224 ms
Execution time: 0.189 ms

持续 280-350 毫秒:

SELECT * FROM "Trade" WHERE "user1" = 1
Bitmap Heap Scan on "Trade"  (cost=4.60..89.99 rows=22 width=155) (actual time=0.023..0.054 rows=23 loops=1)
  Recheck Cond: (user1 = 1)
  Heap Blocks: exact=20
  ->  Bitmap Index Scan on trade_user1  (cost=0.00..4.59 rows=22 width=0) (actual time=0.015..0.015 rows=23 loops=1)
        Index Cond: (user2 = 1)
Planning time: 0.077 ms
Execution time: 0.078 ms

两个查询 return 结果集大小相等。我尝试了更简单查询的不同变体,例如按 ID ASC/DESC.

排序

您应该使用 UNION ALL,它只是连接结果,而不是 UNION,后者 对结果进行重复数据删除,通常是通过排序来制作公平比较。

也许优化器感到困惑,并且由于存在重复数据删除操作而使用了不同的启发式算法,并且小行数导致了没有意义的边缘情况。

我使用的时间似乎来自 PgAdmin,并且在通过 SSH 连接到实际数据库后 network/server 我可以看到两个变体之间的差异可以忽略不计,实际上与出现在EXPLAIN ANALYSE.

所以实际上执行 UNION 并不比单独查询更快/