为什么 Postgres 拒绝在某些设置中使用复合索引?

Why does Postgres refuse to use a composite index in some setups?

所以,我有一个 table 看起来像:

                                 Table "public.rule_traffic"
          Column       |  Type   |                       Modifiers
     id                | bigint  | not null default nextval('rule_traffic_seq'::regclass)
     device_id         | integer | not null
     version_id        | integer | not null
     policy_name       | text    |
     rule_uid          | uuid    | not null
     traffic_hash_code | bigint  | not null
     action            | integer |

连同这些索引:

"rule_traffic_pkey" PRIMARY KEY, btree (id)
"unique_device_id_version_id_policy_name_uid_in_rule_traffic" UNIQUE, btree (device_id, version_id, policy_name, rule_uid)

当我 运行 对我的设置(以及许多其他设置)进行测试查询时,看起来我实际上正在使用定义的索引 unique_device_id_version_id_policy_name_uid_in_rule_traffic :

                                                                             QUERY PLAN
HashAggregate  (cost=8.29..8.30 rows=1 width=56) (actual time=1.563..1.563 rows=0 loops=1)
->  Index Scan using unique_device_id_version_id_policy_name_uid_in_rule_traffic on rule_traffic this_  (cost=0.00..8.28 rows=1 width=56) (actual time=1.558..1.558 rows=0 loops=1)
     Index Cond: ((device_id = 11) AND (policy_name IS NULL))
     Filter: ((rule_uid = 'f6c0dc29-e741-4f9a-adf1-f11d18768af3'::uuid) OR (rule_uid = 'c1a12087-2d85-4e44-a115-f9cad7ec915e'::uuid))
Total runtime: 1.704 ms

但是有一个设置完全不同的查询计划(顺序扫描):

                                                                                    QUERY PLAN
HashAggregate  (cost=150538.23..150538.25 rows=2 width=56) (actual time=2403.600..2403.601 rows=2 loops=1)
->  Seq Scan on rule_traffic this_  (cost=0.00..150538.20 rows=4 width=56) (actual time=2354.481..2403.573 rows=2 loops=1)
     Filter: ((policy_name IS NULL) AND (device_id = 11) AND ((rule_uid = 'f6c0dc29-e741-4f9a-adf1-f11d18768af3'::uuid) OR (rule_uid = 'c1a12087-2d85-4e44-a115-f9cad7ec915e'::uuid)))
Total runtime: 2403.661 ms

我在 table 上尝试了 运行ning VACUUM FULL\ANALYZE,但没有结果。

有谁知道为什么 postgres 决定不使用复合索引?

更新 1:

尝试强制不使用序列扫描:

securetrack=# explain analyze select max(this_.id) as y0_, this_.rule_uid as y1_, this_.policy_name as y2_ from rule_traffic this_ where this_.device_id=11 and ((this_.rule_uid='f6c0dc29-e741-4f9a-adf1-f11d18768af3' and this_.policy_name is null) OR (this_.rule_uid = 'c1a12087-2d85-4e44-a115-f9cad7ec915e' and this_.policy_name is null)) group by this_.rule_uid, this_.policy_name;

QUERY PLAN
 HashAggregate  (cost=209498.38..209498.40 rows=2 width=56) (actual time=2475.980..2475.981 rows=2 loops=1)
   ->  Seq Scan on rule_traffic this_  (cost=0.00..209498.35 rows=4 width=56) (actual time=1631.945..2475.950 rows=3 loops=1)
     Filter: ((policy_name IS NULL) AND (device_id = 11) AND ((rule_uid = 'f6c0dc29-e741-4f9a-adf1-f11d18768af3'::uuid) OR (rule_uid = 'c1a12087-2d85-4e44-a115-f9cad7ec915e'::uuid)))
 Total runtime: 2476.038 ms
(4 rows)

设置 seqscan = false:

securetrack=# SET enable_seqscan=false;
SET
securetrack=# explain analyze select max(this_.id) as y0_, this_.rule_uid as y1_, this_.policy_name as y2_ from rule_traffic this_ where this_.device_id=11 and ((this_.rule_uid='f6c0dc29-e741-4f9a-adf1-f11d18768af3' and this_.policy_name is null) OR (this_.rule_uid = 'c1a12087-2d85-4e44-a115-f9cad7ec915e' and this_.policy_name is null)) group by this_.rule_uid, this_.policy_name;
                                                                                           QUERY PLAN
 HashAggregate  (cost=371469.08..371469.10 rows=2 width=56) (actual time=2936.608..2936.610 rows=2 loops=1)
   ->  Bitmap Heap Scan on rule_traffic this_  (cost=197981.02..371469.05 rows=4 width=56) (actual time=2308.843..2936.577 rows=3 loops=1)
     Recheck Cond: ((device_id = 11) AND (policy_name IS NULL))
     Filter: ((rule_uid = 'f6c0dc29-e741-4f9a-adf1-f11d18768af3'::uuid) OR (rule_uid = 'c1a12087-2d85-4e44-a115-f9cad7ec915e'::uuid))
     ->  Bitmap Index Scan on unique_device_id_version_id_policy_name_uid_in_rule_traffic  (cost=0.00..197981.02 rows=5774287 width=0) (actual time=1283.603..1283.603 rows=5849739 loops=1)
           Index Cond: ((device_id = 11) AND (policy_name IS NULL))
 Total runtime: 2936.680 ms
(7 rows)

看起来成本实际上更高。 怎么可能?

PostgreSQL 在这里做对了。

如果查看强制使用索引的查询计划,您会看到索引扫描找到 5849739 行 (device_id = 11) AND (policy_name IS NULL),所有这些都必须使用 [= 重新检查20=].

现在扫描如此大的索引部分并重新检查找到的所有 table 行比对整个 table 的顺序扫描更昂贵(顺序读取通常比随机访问读取更快).

使用 EXPLAIN (ANALYZE, BUFFERS) 很有启发性,因为它会显示实际访问的数据库块数。