在 postgres 中加入性能
Join performance in postgres
我有 2 个表 scheduling_flownode 和 xact_message,它们之间的关系很弱。我正在尝试执行以下查询
set search_path='ad_96d5be';
explain analyze
SELECT f.id, f.target_object_id
FROM "scheduling_flownode" f,
"xact_message" m
where f.target_object_id = m.id
and f.root_node=True
AND f.state=1
and m.state=4
and m.templatelanguage_id IN (17, 18, 19, 20, 21, 22, 23, 24);
执行时,我得到以下查询计划
Gather (cost=252701.26..1711972.04 rows=374109 width=8) (actual time=17737.908..164181.063 rows=441130 loops=1)
Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=35705 read=1346425, temp read=18190 written=18148
-> Hash Join (cost=251701.26..1673561.14 rows=155879 width=8) (actual time=18805.587..163991.468 rows=147043 loops=3)
Hash Cond: (f.target_object_id = m.id)
Buffers: shared hit=35705 read=1346425, temp read=18190 written=18148
-> Parallel Bitmap Heap Scan on scheduling_flownode f (cost=124367.21..1523127.76 rows=2061083 width=8) (actual time=963.910..155466.840 rows=1642157 loops=3)
Recheck Cond: (state = 1)
Rows Removed by Index Recheck: 44
Filter: root_node
Rows Removed by Filter: 12406874
Heap Blocks: exact=10570 lossy=427078
Buffers: shared read=1328631
-> Bitmap Index Scan on "root-node-and-state" (cost=0.00..123130.57 rows=4946600 width=0) (actual time=955.044..955.045 rows=4926472 loops=1)
Index Cond: ((root_node = true) AND (state = 1))
Buffers: shared read=13464
-> Hash (cost=120677.64..120677.64 rows=405712 width=4) (actual time=7124.131..7124.131 rows=441128 loops=3)
Buckets: 131072 Batches: 8 Memory Usage: 2966kB
Buffers: shared hit=35591 read=17793, temp written=3384
-> Bitmap Heap Scan on xact_message m (cost=7893.56..120677.64 rows=405712 width=4) (actual time=61.307..6925.456 rows=441128 loops=3)
Recheck Cond: (state = 4)
Filter: (templatelanguage_id = ANY ('{17,18,19,20,21,22,23,24}'::integer[]))
Rows Removed by Filter: 4
Heap Blocks: exact=16585
Buffers: shared hit=35591 read=17793
-> Bitmap Index Scan on "state-index" (cost=0.00..7792.13 rows=421826 width=0) (actual time=58.781..58.781 rows=441132 loops=3)
Index Cond: (state = 4)
Buffers: shared hit=2420 read=1209
Planning time: 1.382 ms
Execution time: 164289.481 ms
(31 rows)
scheduling_flownode 这里有超过 400,00,000 个条目,xact_message 有大约 50,00,000 行。在 postgres 10 上工作,我是否错误地认为 postgres 应该可以轻松处理这么多负载?如果可以的话,我这里的查询是不是做错了什么?
你没有显示你有什么索引,但我强烈建议你的索引将覆盖你过滤的所有列。
在 Postgres 11 中,可以通过使用 COVERING indexes 来完成,因此例如在 table scheduling_flownode
上,您将拥有如下索引:
CREATE INDEX ix_scheduling_flownode_target_object_id
ON scheduling_flownode(target_object_id)
INCLUDE (state, root_node);
在 Postgres 10 中,只需将列包含在索引中:
CREATE INDEX ix_scheduling_flownode_target_object_id
ON scheduling_flownode(target_object_id, state, root_node);
对 table xact_message
和 templatelanguage_id
和 state
执行相同的操作。
我有 2 个表 scheduling_flownode 和 xact_message,它们之间的关系很弱。我正在尝试执行以下查询
set search_path='ad_96d5be';
explain analyze
SELECT f.id, f.target_object_id
FROM "scheduling_flownode" f,
"xact_message" m
where f.target_object_id = m.id
and f.root_node=True
AND f.state=1
and m.state=4
and m.templatelanguage_id IN (17, 18, 19, 20, 21, 22, 23, 24);
执行时,我得到以下查询计划
Gather (cost=252701.26..1711972.04 rows=374109 width=8) (actual time=17737.908..164181.063 rows=441130 loops=1)
Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=35705 read=1346425, temp read=18190 written=18148
-> Hash Join (cost=251701.26..1673561.14 rows=155879 width=8) (actual time=18805.587..163991.468 rows=147043 loops=3)
Hash Cond: (f.target_object_id = m.id)
Buffers: shared hit=35705 read=1346425, temp read=18190 written=18148
-> Parallel Bitmap Heap Scan on scheduling_flownode f (cost=124367.21..1523127.76 rows=2061083 width=8) (actual time=963.910..155466.840 rows=1642157 loops=3)
Recheck Cond: (state = 1)
Rows Removed by Index Recheck: 44
Filter: root_node
Rows Removed by Filter: 12406874
Heap Blocks: exact=10570 lossy=427078
Buffers: shared read=1328631
-> Bitmap Index Scan on "root-node-and-state" (cost=0.00..123130.57 rows=4946600 width=0) (actual time=955.044..955.045 rows=4926472 loops=1)
Index Cond: ((root_node = true) AND (state = 1))
Buffers: shared read=13464
-> Hash (cost=120677.64..120677.64 rows=405712 width=4) (actual time=7124.131..7124.131 rows=441128 loops=3)
Buckets: 131072 Batches: 8 Memory Usage: 2966kB
Buffers: shared hit=35591 read=17793, temp written=3384
-> Bitmap Heap Scan on xact_message m (cost=7893.56..120677.64 rows=405712 width=4) (actual time=61.307..6925.456 rows=441128 loops=3)
Recheck Cond: (state = 4)
Filter: (templatelanguage_id = ANY ('{17,18,19,20,21,22,23,24}'::integer[]))
Rows Removed by Filter: 4
Heap Blocks: exact=16585
Buffers: shared hit=35591 read=17793
-> Bitmap Index Scan on "state-index" (cost=0.00..7792.13 rows=421826 width=0) (actual time=58.781..58.781 rows=441132 loops=3)
Index Cond: (state = 4)
Buffers: shared hit=2420 read=1209
Planning time: 1.382 ms
Execution time: 164289.481 ms
(31 rows)
scheduling_flownode 这里有超过 400,00,000 个条目,xact_message 有大约 50,00,000 行。在 postgres 10 上工作,我是否错误地认为 postgres 应该可以轻松处理这么多负载?如果可以的话,我这里的查询是不是做错了什么?
你没有显示你有什么索引,但我强烈建议你的索引将覆盖你过滤的所有列。
在 Postgres 11 中,可以通过使用 COVERING indexes 来完成,因此例如在 table scheduling_flownode
上,您将拥有如下索引:
CREATE INDEX ix_scheduling_flownode_target_object_id
ON scheduling_flownode(target_object_id)
INCLUDE (state, root_node);
在 Postgres 10 中,只需将列包含在索引中:
CREATE INDEX ix_scheduling_flownode_target_object_id
ON scheduling_flownode(target_object_id, state, root_node);
对 table xact_message
和 templatelanguage_id
和 state
执行相同的操作。