如果我使用另一个 ORDER BY，为什么 postgres 不再使用索引？

Question

我在这里没有找到问题的答案，所以我想问一下：

我有以下 table 大约 18kk 行：

# SELECT COUNT(1) from report;
  count   
----------
 18090892
(1 row)

# \d report
                          Table "public.report"
   Column    |           Type           | Collation | Nullable | Default 
-------------+--------------------------+-----------+----------+---------
 reporter_id | uuid                     |           | not null | 
 parsed      | boolean                  |           | not null | 
 id          | text                     |           | not null | 
 request_id  | uuid                     |           |          | 
 created     | timestamp with time zone |           | not null | now()
 customer    | text                     |           | not null | 
 subject     | text                     |           |          | 
Indexes:
    "PK_99e4d0bea58cba73c57f935a546" PRIMARY KEY, btree (id)
    "idx_report_created_desc" btree (created DESC)
    "idx_report_reporter_id_asc_created_desc" btree (reporter_id, created DESC)
    "idx_report_request_id_asc_created_desc" btree (request_id, created DESC)
Foreign-key constraints:
    "FK_5b809608bb38d119333b69f65f9" FOREIGN KEY (request_id) REFERENCES request(id)
    "FK_d41df66b60944992386ed47cf2e" FOREIGN KEY (reporter_id) REFERENCES reporter(id)

如果我使用 ORDER BY created DESC LIMIT 25 则使用索引：

# EXPLAIN ANALYZE SELECT * FROM report ORDER BY created DESC LIMIT 25;
                                                                      QUERY PLAN                          
                                            
----------------------------------------------------------------------------------------------------------
--------------------------------------------
 Limit  (cost=0.44..2.49 rows=25 width=169) (actual time=0.035..0.063 rows=25 loops=1)
   ->  Index Scan using idx_report_created_desc on report  (cost=0.44..1482912.16 rows=18090892 width=169)
 (actual time=0.033..0.051 rows=25 loops=1)
 Planning Time: 0.239 ms
 Execution Time: 0.105 ms
(4 rows)

但是，如果我使用ORDER BY created DESC, id ASC LIMIT 25，则不再使用该索引：

# EXPLAIN ANALYZE SELECT * FROM "report" ORDER BY "created" DESC, "id" ASC LIMIT 25;
                                                                   QUERY PLAN                                                                    
-------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=587891.07..587893.99 rows=25 width=169) (actual time=2719.606..2726.355 rows=25 loops=1)
   ->  Gather Merge  (cost=587891.07..2346850.67 rows=15075744 width=169) (actual time=2711.873..2718.618 rows=25 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         ->  Sort  (cost=586891.04..605735.72 rows=7537872 width=169) (actual time=2643.445..2643.448 rows=21 loops=3)
               Sort Key: created DESC, id
               Sort Method: top-N heapsort  Memory: 35kB
               Worker 0:  Sort Method: top-N heapsort  Memory: 32kB
               Worker 1:  Sort Method: top-N heapsort  Memory: 31kB
               ->  Parallel Seq Scan on report  (cost=0.00..374177.72 rows=7537872 width=169) (actual time=0.018..1910.204 rows=6030297 loops=3)
 Planning Time: 0.396 ms
 JIT:
   Functions: 1
   Options: Inlining true, Optimization true, Expressions true, Deforming true
   Timing: Generation 4.757 ms, Inlining 0.172 ms, Optimization 5.053 ms, Emission 2.003 ms, Total 11.985 ms
 Execution Time: 2731.226 ms
(16 rows)

如果我理解正确，应该仍然使用索引，因为应该返回相同的结果集，只是可能按照 ORDER BY id ASC 确定的不同顺序。

所以我想知道为什么 postgres 决定进行并行 seq 扫描而不是使用索引然后按 id 对返回的 25 行进行排序？这绝对应该比并行 seq 扫描更快，不是吗？

或者我哪里错了？

Answer 1

这里最可能的解释是您目前只有 created DESC 上的索引。结果，当您执行以下查询时：

SELECT * FROM report ORDER BY created DESC, id

单列created索引的叶节点没有可用的id值。如果 Postgres 要使用这个索引，它必须为每个叶节点寻找回原始的 table（聚集索引）。这种来回查找的成本可能很高，有时可能会超过首先使用索引的好处。

如果你需要这样的两层排序，那么添加一个覆盖它的索引：

CREATE INDEX idx_new ON report (created DESC, id);

请注意，假设 id 是主键，某些数据库（例如 MySQL）会自动将 id 列标记为您当前的 created DESC 索引其中 table。但是在Postgres中好像不是这样。

Answer 2

PostgreSQL 并非无限聪明。有些事情它没有弄清楚，尽管理论上可以弄清楚。

但它一直在变得越来越聪明。升级到版本 13，看看会发生什么。它应该使用索引扫描加一个非常快的'incremental sort'。增量排序只需要打破“已创建”之间的联系，我认为这种情况很少见。

If I understand correctly, the index should still be used because the same set of results should be returned, only possibly in a different order determined by ORDER BY id ASC

但是在存在LIMIT的情况下，以不同的顺序返回结果意味着返回不同结果的可能性。因此，它必须采取特殊措施来解决这个问题。在 v13 中是这样。

如果我使用另一个 ORDER BY，为什么 postgres 不再使用索引？

Why is postgres no longer using the index if I use another ORDER BY?

postgresql

indexing

sql-order-by