Postgres 部分索引与常规索引

Question

我有一个 table，其中包含 100 万条记录，其中 100k 条记录在 colA 上为空。剩余记录具有截然不同的值，在此列上创建常规索引与使用 where colA is not null 的部分索引有区别吗？

由于常规 Postgres 索引不存储 NULL 值，这与使用 where colA is not null 创建部分索引不一样吗？
这两个指数的优缺点？

Answer 1

如果您创建不带空值的部分索引，它将不会使用它来查找空值。

这是一个在 13.5 上进行全索引测试。

# create index idx_test_num on test(num);
CREATE INDEX

# explain select count(*) from test where num is null;
                                     QUERY PLAN                                      
-------------------------------------------------------------------------------------
 Aggregate  (cost=5135.00..5135.01 rows=1 width=8)
   ->  Bitmap Heap Scan on test  (cost=63.05..5121.25 rows=5500 width=0)
         Recheck Cond: (num IS NULL)
         ->  Bitmap Index Scan on idx_test_num  (cost=0.00..61.68 rows=5500 width=0)
               Index Cond: (num IS NULL)
(5 rows)

并带有部分索引。

# create index idx_test_num on test(num) where num is not null;
CREATE INDEX

# explain select count(*) from test where num is null;
                                      QUERY PLAN                                      
--------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=10458.12..10458.13 rows=1 width=8)
   ->  Gather  (cost=10457.90..10458.11 rows=2 width=8)
         Workers Planned: 2
         ->  Partial Aggregate  (cost=9457.90..9457.91 rows=1 width=8)
               ->  Parallel Seq Scan on test  (cost=0.00..9352.33 rows=42228 width=0)
                     Filter: (num IS NULL)
(6 rows)

Since regular postgres indexes do not store NULL values...

自 16 年前的 8.2 版 [检查说明] 以来，情况并非如此。 8.2 docs 说...

Indexes are not used for IS NULL clauses by default. The best way to use indexes in such cases is to create a partial index using an IS NULL predicate.

8.3 introduced nulls first 和 nulls last 以及围绕 null 的许多其他改进，包括...

Allow col IS NULL to use an index (Teodor)

Answer 2

这一切都取决于。

NULL 值包含在（默认）B-tree 索引中，因为 Postgres 8.3 版本，如 Schwern 提供的。但是，您提到的谓词 (where colA is not null) 仅在 Postgres 9.0 后才得到正确支持。 The release notes:

Allow IS NOT NULL restrictions to use indexes (Tom Lane)

This is particularly useful for finding MAX()/MIN() values in indexes that contain many null values.

GIN indexes 紧随其后：

As of PostgreSQL 9.1, null key values can be included in the index.

通常情况下，如果部分索引从索引中排除了 table 的主要部分，则它是有意义的，从而使它显着变小并节省对索引的写入。由于 B-tree 索引非常浅，裸寻性能扩展非常好（一旦索引被缓存）。索引条目减少 10% 在该领域几乎无关紧要。

您的案例只会排除大约 10% 的所有行，而且很少有回报。部分索引会为查询规划器增加一些开销，并排除与索引条件不匹配的查询。（如果匹配不是很明显，Postgres 查询规划器不会努力尝试。）

OTOH，Postgres 很少使用索引来检索 table 的 10%——顺序扫描通常会更快。同样，这取决于。

如果（几乎）所有查询无论如何都排除 NULL（以 Postgres 规划器理解的方式），那么仅排除所有行的 10% 的部分索引仍然是一个明智的选择。但如果查询模式发生变化，它可能会适得其反。增加的复杂性可能不值得。

还值得注意的是，在 Postgres 索引中仍然存在具有 NULL 值的极端情况。我最近遇到了这种情况，当第一个索引表达式用 IS NULL 过滤时，Postgres 证明不愿意从多列索引中读取排序的行（使部分索引更适合这种情况）：

db<>fiddle here

所以，还是要看全图。

Postgres 部分索引与常规索引

Postgres partial vs regular index

postgresql

indexing

null

partial-index