为什么 PostgreSQL 会为简单的查询做如此艰难的计划？

Question

我有 2500 万行 "Zemla" table 索引

CREATE INDEX zemla_level
  ON public."Zemla"
  USING btree
  (level);

现在我做简单的查询

select * from "Zemla" where level = 7

并获得非常困难的查询计划

Bitmap Heap Scan on "Zemla"  (cost=18316.26..636704.15 rows=978041 width=181) (actual time=216.681..758.663 rows=975247 loops=1)
  Recheck Cond: (level = 7)
  Heap Blocks: exact=54465
  ->  Bitmap Index Scan on zemla_level  (cost=0.00..18071.74 rows=978041 width=0) (actual time=198.041..198.041 rows=1949202 loops=1)
        Index Cond: (level = 7)

还有另一个简单的查询，我认为应该在索引存在时立即执行

select count(*) from "Zemla" where level = 7

Aggregate  (cost=639149.25..639149.26 rows=1 width=0) (actual time=1188.366..1188.366 rows=1 loops=1)
  ->  Bitmap Heap Scan on "Zemla"  (cost=18316.26..636704.15 rows=978041 width=0) (actual time=213.918..763.833 rows=975247 loops=1)
        Recheck Cond: (level = 7)
        Heap Blocks: exact=54465
        ->  Bitmap Index Scan on zemla_level  (cost=0.00..18071.74 rows=978041 width=0) (actual time=195.409..195.409 rows=1949202 loops=1)
              Index Cond: (level = 7)

我的问题是，为什么 PostgreSQL 在第一次索引扫描后执行另一个位图堆扫描，开销如此之大？

编辑：What is a "Bitmap heap scan" in a query plan? 是另一个问题，因为它回答了为什么某些使用 OR 运算符的查询具有位图堆扫描。我的查询既没有 OR 也没有 AND 运算符

Answer 1

如果我没记错的话，bitmap Heap Scan就是从磁盘中获取数据的算法。它分析引擎必须获取的所有磁盘页面并对其进行排序，以最大限度地减少硬盘磁头移动。

这需要时间，因为您的 table 必须非常大并且可能在磁盘上高度碎片化。

对于您的第二个查询 count(*)，PostgreSQL 仍需要读取结果行以验证它们是否存在；其他数据库系统可能只需要在这种情况下引用索引。查看此页面以获取更多信息：

https://wiki.postgresql.org/wiki/Index-only_scans

在 table 上尝试 VACCUM FULL，看看它是否加快速度。

为什么 PostgreSQL 会为简单的查询做如此艰难的计划？

Why PostgreSQL does so hard plan for simple query?

postgresql

performance

query-planner