为什么 Postgres 拒绝在某些设置中使用复合索引?
Why does Postgres refuse to use a composite index in some setups?
所以,我有一个 table 看起来像:
Table "public.rule_traffic"
Column | Type | Modifiers
id | bigint | not null default nextval('rule_traffic_seq'::regclass)
device_id | integer | not null
version_id | integer | not null
policy_name | text |
rule_uid | uuid | not null
traffic_hash_code | bigint | not null
action | integer |
连同这些索引:
"rule_traffic_pkey" PRIMARY KEY, btree (id)
"unique_device_id_version_id_policy_name_uid_in_rule_traffic" UNIQUE, btree (device_id, version_id, policy_name, rule_uid)
当我 运行 对我的设置(以及许多其他设置)进行测试查询时,看起来我实际上正在使用定义的索引 unique_device_id_version_id_policy_name_uid_in_rule_traffic :
QUERY PLAN
HashAggregate (cost=8.29..8.30 rows=1 width=56) (actual time=1.563..1.563 rows=0 loops=1)
-> Index Scan using unique_device_id_version_id_policy_name_uid_in_rule_traffic on rule_traffic this_ (cost=0.00..8.28 rows=1 width=56) (actual time=1.558..1.558 rows=0 loops=1)
Index Cond: ((device_id = 11) AND (policy_name IS NULL))
Filter: ((rule_uid = 'f6c0dc29-e741-4f9a-adf1-f11d18768af3'::uuid) OR (rule_uid = 'c1a12087-2d85-4e44-a115-f9cad7ec915e'::uuid))
Total runtime: 1.704 ms
但是有一个设置完全不同的查询计划(顺序扫描):
QUERY PLAN
HashAggregate (cost=150538.23..150538.25 rows=2 width=56) (actual time=2403.600..2403.601 rows=2 loops=1)
-> Seq Scan on rule_traffic this_ (cost=0.00..150538.20 rows=4 width=56) (actual time=2354.481..2403.573 rows=2 loops=1)
Filter: ((policy_name IS NULL) AND (device_id = 11) AND ((rule_uid = 'f6c0dc29-e741-4f9a-adf1-f11d18768af3'::uuid) OR (rule_uid = 'c1a12087-2d85-4e44-a115-f9cad7ec915e'::uuid)))
Total runtime: 2403.661 ms
我在 table 上尝试了 运行ning VACUUM FULL\ANALYZE,但没有结果。
有谁知道为什么 postgres 决定不使用复合索引?
更新 1:
尝试强制不使用序列扫描:
securetrack=# explain analyze select max(this_.id) as y0_, this_.rule_uid as y1_, this_.policy_name as y2_ from rule_traffic this_ where this_.device_id=11 and ((this_.rule_uid='f6c0dc29-e741-4f9a-adf1-f11d18768af3' and this_.policy_name is null) OR (this_.rule_uid = 'c1a12087-2d85-4e44-a115-f9cad7ec915e' and this_.policy_name is null)) group by this_.rule_uid, this_.policy_name;
QUERY PLAN
HashAggregate (cost=209498.38..209498.40 rows=2 width=56) (actual time=2475.980..2475.981 rows=2 loops=1)
-> Seq Scan on rule_traffic this_ (cost=0.00..209498.35 rows=4 width=56) (actual time=1631.945..2475.950 rows=3 loops=1)
Filter: ((policy_name IS NULL) AND (device_id = 11) AND ((rule_uid = 'f6c0dc29-e741-4f9a-adf1-f11d18768af3'::uuid) OR (rule_uid = 'c1a12087-2d85-4e44-a115-f9cad7ec915e'::uuid)))
Total runtime: 2476.038 ms
(4 rows)
设置 seqscan = false:
securetrack=# SET enable_seqscan=false;
SET
securetrack=# explain analyze select max(this_.id) as y0_, this_.rule_uid as y1_, this_.policy_name as y2_ from rule_traffic this_ where this_.device_id=11 and ((this_.rule_uid='f6c0dc29-e741-4f9a-adf1-f11d18768af3' and this_.policy_name is null) OR (this_.rule_uid = 'c1a12087-2d85-4e44-a115-f9cad7ec915e' and this_.policy_name is null)) group by this_.rule_uid, this_.policy_name;
QUERY PLAN
HashAggregate (cost=371469.08..371469.10 rows=2 width=56) (actual time=2936.608..2936.610 rows=2 loops=1)
-> Bitmap Heap Scan on rule_traffic this_ (cost=197981.02..371469.05 rows=4 width=56) (actual time=2308.843..2936.577 rows=3 loops=1)
Recheck Cond: ((device_id = 11) AND (policy_name IS NULL))
Filter: ((rule_uid = 'f6c0dc29-e741-4f9a-adf1-f11d18768af3'::uuid) OR (rule_uid = 'c1a12087-2d85-4e44-a115-f9cad7ec915e'::uuid))
-> Bitmap Index Scan on unique_device_id_version_id_policy_name_uid_in_rule_traffic (cost=0.00..197981.02 rows=5774287 width=0) (actual time=1283.603..1283.603 rows=5849739 loops=1)
Index Cond: ((device_id = 11) AND (policy_name IS NULL))
Total runtime: 2936.680 ms
(7 rows)
看起来成本实际上更高。
怎么可能?
PostgreSQL 在这里做对了。
如果查看强制使用索引的查询计划,您会看到索引扫描找到 5849739 行 (device_id = 11) AND (policy_name IS NULL)
,所有这些都必须使用 [= 重新检查20=].
现在扫描如此大的索引部分并重新检查找到的所有 table 行比对整个 table 的顺序扫描更昂贵(顺序读取通常比随机访问读取更快).
使用 EXPLAIN (ANALYZE, BUFFERS)
很有启发性,因为它会显示实际访问的数据库块数。
所以,我有一个 table 看起来像:
Table "public.rule_traffic"
Column | Type | Modifiers
id | bigint | not null default nextval('rule_traffic_seq'::regclass)
device_id | integer | not null
version_id | integer | not null
policy_name | text |
rule_uid | uuid | not null
traffic_hash_code | bigint | not null
action | integer |
连同这些索引:
"rule_traffic_pkey" PRIMARY KEY, btree (id)
"unique_device_id_version_id_policy_name_uid_in_rule_traffic" UNIQUE, btree (device_id, version_id, policy_name, rule_uid)
当我 运行 对我的设置(以及许多其他设置)进行测试查询时,看起来我实际上正在使用定义的索引 unique_device_id_version_id_policy_name_uid_in_rule_traffic :
QUERY PLAN
HashAggregate (cost=8.29..8.30 rows=1 width=56) (actual time=1.563..1.563 rows=0 loops=1)
-> Index Scan using unique_device_id_version_id_policy_name_uid_in_rule_traffic on rule_traffic this_ (cost=0.00..8.28 rows=1 width=56) (actual time=1.558..1.558 rows=0 loops=1)
Index Cond: ((device_id = 11) AND (policy_name IS NULL))
Filter: ((rule_uid = 'f6c0dc29-e741-4f9a-adf1-f11d18768af3'::uuid) OR (rule_uid = 'c1a12087-2d85-4e44-a115-f9cad7ec915e'::uuid))
Total runtime: 1.704 ms
但是有一个设置完全不同的查询计划(顺序扫描):
QUERY PLAN
HashAggregate (cost=150538.23..150538.25 rows=2 width=56) (actual time=2403.600..2403.601 rows=2 loops=1)
-> Seq Scan on rule_traffic this_ (cost=0.00..150538.20 rows=4 width=56) (actual time=2354.481..2403.573 rows=2 loops=1)
Filter: ((policy_name IS NULL) AND (device_id = 11) AND ((rule_uid = 'f6c0dc29-e741-4f9a-adf1-f11d18768af3'::uuid) OR (rule_uid = 'c1a12087-2d85-4e44-a115-f9cad7ec915e'::uuid)))
Total runtime: 2403.661 ms
我在 table 上尝试了 运行ning VACUUM FULL\ANALYZE,但没有结果。
有谁知道为什么 postgres 决定不使用复合索引?
更新 1:
尝试强制不使用序列扫描:
securetrack=# explain analyze select max(this_.id) as y0_, this_.rule_uid as y1_, this_.policy_name as y2_ from rule_traffic this_ where this_.device_id=11 and ((this_.rule_uid='f6c0dc29-e741-4f9a-adf1-f11d18768af3' and this_.policy_name is null) OR (this_.rule_uid = 'c1a12087-2d85-4e44-a115-f9cad7ec915e' and this_.policy_name is null)) group by this_.rule_uid, this_.policy_name;
QUERY PLAN
HashAggregate (cost=209498.38..209498.40 rows=2 width=56) (actual time=2475.980..2475.981 rows=2 loops=1)
-> Seq Scan on rule_traffic this_ (cost=0.00..209498.35 rows=4 width=56) (actual time=1631.945..2475.950 rows=3 loops=1)
Filter: ((policy_name IS NULL) AND (device_id = 11) AND ((rule_uid = 'f6c0dc29-e741-4f9a-adf1-f11d18768af3'::uuid) OR (rule_uid = 'c1a12087-2d85-4e44-a115-f9cad7ec915e'::uuid)))
Total runtime: 2476.038 ms
(4 rows)
设置 seqscan = false:
securetrack=# SET enable_seqscan=false;
SET
securetrack=# explain analyze select max(this_.id) as y0_, this_.rule_uid as y1_, this_.policy_name as y2_ from rule_traffic this_ where this_.device_id=11 and ((this_.rule_uid='f6c0dc29-e741-4f9a-adf1-f11d18768af3' and this_.policy_name is null) OR (this_.rule_uid = 'c1a12087-2d85-4e44-a115-f9cad7ec915e' and this_.policy_name is null)) group by this_.rule_uid, this_.policy_name;
QUERY PLAN
HashAggregate (cost=371469.08..371469.10 rows=2 width=56) (actual time=2936.608..2936.610 rows=2 loops=1)
-> Bitmap Heap Scan on rule_traffic this_ (cost=197981.02..371469.05 rows=4 width=56) (actual time=2308.843..2936.577 rows=3 loops=1)
Recheck Cond: ((device_id = 11) AND (policy_name IS NULL))
Filter: ((rule_uid = 'f6c0dc29-e741-4f9a-adf1-f11d18768af3'::uuid) OR (rule_uid = 'c1a12087-2d85-4e44-a115-f9cad7ec915e'::uuid))
-> Bitmap Index Scan on unique_device_id_version_id_policy_name_uid_in_rule_traffic (cost=0.00..197981.02 rows=5774287 width=0) (actual time=1283.603..1283.603 rows=5849739 loops=1)
Index Cond: ((device_id = 11) AND (policy_name IS NULL))
Total runtime: 2936.680 ms
(7 rows)
看起来成本实际上更高。 怎么可能?
PostgreSQL 在这里做对了。
如果查看强制使用索引的查询计划,您会看到索引扫描找到 5849739 行 (device_id = 11) AND (policy_name IS NULL)
,所有这些都必须使用 [= 重新检查20=].
现在扫描如此大的索引部分并重新检查找到的所有 table 行比对整个 table 的顺序扫描更昂贵(顺序读取通常比随机访问读取更快).
使用 EXPLAIN (ANALYZE, BUFFERS)
很有启发性,因为它会显示实际访问的数据库块数。