autovacuum 后的索引大小

Index size after autovacuum

美好的一天。我正在阅读与 Vacuum 进程和 Reindex 例程相关的 Postgres 官方文档。有些句子我不清楚,所以我想澄清一下。(版本 12 的 Postgres 文档)

首先。我确实了解 autovacuum 检查 table 是否有死元组,将它们的位置存储在名为“maintenance_work_mem”的特殊内存中,然后当此内存已满时,真空删除所有索引中引用这些位置的相应页面.有关重建索引的文档 says

B-tree index pages that have become completely empty are reclaimed for re-use. However, there is still a possibility of inefficient use of space: if all but a few index keys on a page have been deleted, the page remains allocated

问题是。如果“页面仍然分配”那么这是否意味着 autovacuum 不会 return 物理 space 从索引内的已删除页面到 OS?例如索引占用 1 GB 内存。我从 table 和 运行 vacuum 中删除了除一行以外的所有行。在这种情况下,索引仍会占用 1 Gb 的内存。我说的对吗?

VACUUM 是(但 VACUUM FULL 不是):

select version();
                                                 version                                                 
---------------------------------------------------------------------------------------------------------
 PostgreSQL 12.3 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39), 64-bit
(1 row)

create table t(s text);
CREATE TABLE

insert into t select generate_series(1,300000)::text;
INSERT 0 300000

select pg_size_pretty(pg_table_size('t'));
 pg_size_pretty 
----------------
 10 MB
(1 row)

create index on t(s);
CREATE INDEX

select pg_size_pretty(pg_indexes_size('t'));
 pg_size_pretty 
----------------
 6600 kB
(1 row)

delete from t where s <> '1';
DELETE 299999

select count(*) from t;
 count 
-------
     1
(1 row)

select pg_size_pretty(pg_table_size('t'));
 pg_size_pretty 
----------------
 10 MB
(1 row)

select pg_size_pretty(pg_indexes_size('t'));
 pg_size_pretty 
----------------
 6600 kB
(1 row)

vacuum t;
VACUUM
select pg_size_pretty(pg_table_size('t'));
 pg_size_pretty 
----------------
 48 kB
(1 row)

select pg_size_pretty(pg_indexes_size('t'));
 pg_size_pretty 
----------------
 6600 kB
(1 row)

vacuum full t;
VACUUM
select pg_size_pretty(pg_table_size('t'));
 pg_size_pretty 
----------------
 16 kB
(1 row)

select pg_size_pretty(pg_indexes_size('t'));
 pg_size_pretty 
----------------
 16 kB
(1 row)

REINDEX 没有:

select version();
                                                 version                                                 
---------------------------------------------------------------------------------------------------------
 PostgreSQL 12.3 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39), 64-bit
(1 row)

create table t(s text);
CREATE TABLE

insert into t select generate_series(1,300000)::text;
INSERT 0 300000

select pg_size_pretty(pg_table_size('t'));
 pg_size_pretty 
----------------
 10 MB
(1 row)

create index on t(s);
CREATE INDEX

select pg_size_pretty(pg_indexes_size('t'));
 pg_size_pretty 
----------------
 6600 kB
(1 row)

delete from t where s <> '1';
DELETE 299999

select count(*) from t;
 count 
-------
     1
(1 row)

select pg_size_pretty(pg_table_size('t'));
 pg_size_pretty 
----------------
 10 MB
(1 row)

select pg_size_pretty(pg_indexes_size('t'));
 pg_size_pretty 
----------------
 6600 kB
(1 row)

reindex table t;
REINDEX

select pg_size_pretty(pg_table_size('t'));
 pg_size_pretty 
----------------
 10 MB
(1 row)

select pg_size_pretty(pg_indexes_size('t'));
 pg_size_pretty 
----------------
 16 kB
(1 row)

src/backend/access/nbtree 中的 README 对此有很多深入的信息。此答案中的引用来自那里。

如果您真的删除了 table 中除一行以外的所有行,索引中的几乎所有页面都会被删除。

We consider deleting an entire page from the btree only when it's become completely empty of items. (Merging partly-full pages would allow better space reuse, but it seems impractical to move existing data items left or right to make this happen --- a scan moving in the opposite direction might miss the items if so.) Also, we never delete the rightmost page on a tree level (this restriction simplifies the traversal algorithms, as explained below). Page deletion always begins from an empty leaf page. An internal page can only be deleted as part of deleting an entire subtree. This is always a "skinny" subtree consisting of a "chain" of internal pages plus a single leaf page. There is one page on each level of the subtree, and each level/page covers the same key space.

space没有发布到操作系统,但是:

Reclaiming a page doesn't actually change its state on disk --- we simply record it in the shared-memory free space map, from which it will be handed out the next time a new page is needed for a page split.

树会变得“瘦”,因为索引的深度永远不会缩小。 PostgreSQL 对此进行了优化:

Because we never delete the rightmost page of any level (and in particular never delete the root), it's impossible for the height of the tree to decrease. After massive deletions we might have a scenario in which the tree is "skinny", with several single-page levels below the root. Operations will still be correct in this case, but we'd waste cycles descending through the single-page levels. To handle this we use an idea from Lanin and Shasha: we keep track of the "fast root" level, which is the lowest single-page level. The meta-data page keeps a pointer to this level as well as the true root. All ordinary operations initiate their searches at the fast root not the true root.

如果您运行 REINDEX INDEX 索引或VACUUM (FULL) table,索引将被重建,并且space 将被释放。