如何从经常访问的 table 中删除许多行

Question

我需要删除一个非常大的 table（比如 500 万行）的大部分（比如 90%）。 table 的另外 10% 经常被读取，但不被写入。

从“Best way to delete millions of rows by ID”，我收集到我应该删除我正在删除的 90% 上的任何索引，以加快这个过程（我正在使用的索引除外 select要删除的行）。

从“PostgreSQL locking mode”，我看到此操作将获取整个 table 上的 ROW EXCLUSIVE 锁。但因为我只阅读其他 10%，所以这应该无关紧要。

那么，在一个命令中删除所有内容是否安全（即 DELETE FROM table WHERE delete_flag='t'）？我担心如果删除一行失败，触发巨大的回滚，那么它会影响我从table读取的能力。批量删除会不会更明智？

Answer 1

索引通常对所有行中 90% 的操作无用。无论哪种方式，顺序扫描都会更快。（特殊例外情况适用。）
如果需要允许并发读取，则不能在 table 上使用独占锁。所以你也不能在同一个事务中删除任何索引。
您可以在单独的事务中删除索引以将独占锁的持续时间保持在最短。在 Postgres 9.2 或更高版本中，您还可以使用 DROP INDEX CONCURRENTLY，它只需要最少的锁。稍后使用 CREATE INDEX CONCURRENTLY 在后台重建索引 - 并且只需要一个非常短暂的独占锁。

如果您有一个 stable 条件来确定保留的 10%（或更少）的行，我建议只对这些行使用 partial index 以获得最佳的两行：

阅读查询可以随时快速访问table（使用部分索引）。
大 DELETE 根本不会修改部分索引，因为 none 行涉及 DELETE.

CREATE INDEX foo (some_id) WHERE delete_flag = FALSE;

假设 delete_flag 是 boolean。您必须在查询中包含相同的谓词（即使它在逻辑上看起来是多余的）以确保 Postgres 可以使用部分索引。

Answer 2

使用特定大小的批次删除并在删除之间休眠：

create temp table t as
select id from tbl where ...;
create index on t(id);

do $$
declare sleep int = 5;
declare batch_size int = 10000;
declare c refcursor;
declare cur_id int = 0;
declare seq_id int = 0;
declare del_id int = 0;
declare ts timestamp;
begin
    <<top>>
    loop
        raise notice 'sleep % sec', sleep;
        perform pg_sleep(sleep);
        raise notice 'continue..';
        open c for select id from t order by id;
        <<inn>>
        loop
            fetch from c into cur_id;
            seq_id = seq_id + 1;
            del_id = del_id + 1;
            if cur_id is null then
                raise notice 'goin to del end: %', del_id;
                ts = current_timestamp;
                close c;
                delete from tbl tb using t where tb.id = t.id;
                delete from t;
                commit;
                raise notice 'ok: %s', current_timestamp - ts;
                exit top;
            elsif seq_id >= batch_size then
                raise notice 'goin to del: %', del_id;
                ts = current_timestamp;
                delete from tbl tb using t where t.id = tb.id and t.id <= cur_id;
                delete from t where id <= cur_id;
                close c;
                commit;
                raise notice 'ok: %s', current_timestamp - ts;
                seq_id = 0;
                exit inn;
            end if;
        end loop inn;
    end loop top;
end;
$$;

如何从经常访问的 table 中删除许多行

How to delete many rows from frequently accessed table

postgresql

indexing

locking

transactions

postgresql-performance