为什么两个查询的联合比单个联合查询更快?
Why is a union of two queries faster than a single one of the unioned queries?
我在 2x Google Cloud SQL Postgres 9.6 实例中使用 Autovacuuming 调试查询时间。暂存(无流量)7.5gb + 2vCPU。和生产:37.5gb 10vCPU。结果相同且令人困惑。
索引:
- trade_user1
- trade_user2
持续100-120ms:
SELECT * FROM "Trade" WHERE "user1" = 1
UNION
SELECT * FROM "Trade" WHERE "user2" = 1
LIMIT 24;
Limit (cost=221.92..222.16 rows=24 width=1187) (actual time=0.115..0.124 rows=24 loops=1)
-> HashAggregate (cost=221.92..222.46 rows=54 width=1187) (actual time=0.115..0.121 rows=24 loops=1)
Group Key: id, status, user1, user2
-> Append (cost=4.60..218.55 rows=54 width=1187) (actual time=0.024..0.076 rows=26 loops=1)
-> Bitmap Heap Scan on "Trade" (cost=4.60..89.99 rows=22 width=155) (actual time=0.024..0.061 rows=23 loops=1)
Recheck Cond: (user1 = 1)
Heap Blocks: exact=20
-> Bitmap Index Scan on trade_depositor_user_id (cost=0.00..4.59 rows=22 width=0) (actual time=0.016..0.016 rows=23 loops=1)
Index Cond: (user1 = 1)
-> Bitmap Heap Scan on "Trade" "Trade_1" (cost=4.67..128.02 rows=32 width=155) (actual time=0.011..0.014 rows=3 loops=1)
Recheck Cond: (user2 = 1)
Heap Blocks: exact=3
-> Bitmap Index Scan on trade_withdrawer_user_id (cost=0.00..4.67 rows=32 width=0) (actual time=0.009..0.009 rows=3 loops=1)
Index Cond: (user2 = 1)
Planning time: 0.224 ms
Execution time: 0.189 ms
持续 280-350 毫秒:
SELECT * FROM "Trade" WHERE "user1" = 1
Bitmap Heap Scan on "Trade" (cost=4.60..89.99 rows=22 width=155) (actual time=0.023..0.054 rows=23 loops=1)
Recheck Cond: (user1 = 1)
Heap Blocks: exact=20
-> Bitmap Index Scan on trade_user1 (cost=0.00..4.59 rows=22 width=0) (actual time=0.015..0.015 rows=23 loops=1)
Index Cond: (user2 = 1)
Planning time: 0.077 ms
Execution time: 0.078 ms
两个查询 return 结果集大小相等。我尝试了更简单查询的不同变体,例如按 ID ASC/DESC.
排序
您应该使用 UNION ALL
,它只是连接结果,而不是 UNION
,后者 也 对结果进行重复数据删除,通常是通过排序来制作公平比较。
也许优化器感到困惑,并且由于存在重复数据删除操作而使用了不同的启发式算法,并且小行数导致了没有意义的边缘情况。
我使用的时间似乎来自 PgAdmin,并且在通过 SSH 连接到实际数据库后 network/server 我可以看到两个变体之间的差异可以忽略不计,实际上与出现在EXPLAIN ANALYSE
.
所以实际上执行 UNION 并不比单独查询更快/
我在 2x Google Cloud SQL Postgres 9.6 实例中使用 Autovacuuming 调试查询时间。暂存(无流量)7.5gb + 2vCPU。和生产:37.5gb 10vCPU。结果相同且令人困惑。
索引:
- trade_user1
- trade_user2
持续100-120ms:
SELECT * FROM "Trade" WHERE "user1" = 1
UNION
SELECT * FROM "Trade" WHERE "user2" = 1
LIMIT 24;
Limit (cost=221.92..222.16 rows=24 width=1187) (actual time=0.115..0.124 rows=24 loops=1)
-> HashAggregate (cost=221.92..222.46 rows=54 width=1187) (actual time=0.115..0.121 rows=24 loops=1)
Group Key: id, status, user1, user2
-> Append (cost=4.60..218.55 rows=54 width=1187) (actual time=0.024..0.076 rows=26 loops=1)
-> Bitmap Heap Scan on "Trade" (cost=4.60..89.99 rows=22 width=155) (actual time=0.024..0.061 rows=23 loops=1)
Recheck Cond: (user1 = 1)
Heap Blocks: exact=20
-> Bitmap Index Scan on trade_depositor_user_id (cost=0.00..4.59 rows=22 width=0) (actual time=0.016..0.016 rows=23 loops=1)
Index Cond: (user1 = 1)
-> Bitmap Heap Scan on "Trade" "Trade_1" (cost=4.67..128.02 rows=32 width=155) (actual time=0.011..0.014 rows=3 loops=1)
Recheck Cond: (user2 = 1)
Heap Blocks: exact=3
-> Bitmap Index Scan on trade_withdrawer_user_id (cost=0.00..4.67 rows=32 width=0) (actual time=0.009..0.009 rows=3 loops=1)
Index Cond: (user2 = 1)
Planning time: 0.224 ms
Execution time: 0.189 ms
持续 280-350 毫秒:
SELECT * FROM "Trade" WHERE "user1" = 1
Bitmap Heap Scan on "Trade" (cost=4.60..89.99 rows=22 width=155) (actual time=0.023..0.054 rows=23 loops=1)
Recheck Cond: (user1 = 1)
Heap Blocks: exact=20
-> Bitmap Index Scan on trade_user1 (cost=0.00..4.59 rows=22 width=0) (actual time=0.015..0.015 rows=23 loops=1)
Index Cond: (user2 = 1)
Planning time: 0.077 ms
Execution time: 0.078 ms
两个查询 return 结果集大小相等。我尝试了更简单查询的不同变体,例如按 ID ASC/DESC.
排序您应该使用 UNION ALL
,它只是连接结果,而不是 UNION
,后者 也 对结果进行重复数据删除,通常是通过排序来制作公平比较。
也许优化器感到困惑,并且由于存在重复数据删除操作而使用了不同的启发式算法,并且小行数导致了没有意义的边缘情况。
我使用的时间似乎来自 PgAdmin,并且在通过 SSH 连接到实际数据库后 network/server 我可以看到两个变体之间的差异可以忽略不计,实际上与出现在EXPLAIN ANALYSE
.
所以实际上执行 UNION 并不比单独查询更快/