在大型 table 上使用 OFFSET 优化查询

Question

我有table

create table big_table (
id serial primary key,
-- other columns here
vote int
);

这个table很大，大概7000万行，我要查询：

SELECT * FROM big_table
ORDER BY vote [ASC|DESC], id [ASC|DESC]
OFFSET x LIMIT n  -- I need this for pagination

您可能知道，当 x 是一个很大的数字时，这样的查询非常慢。

为了性能优化我添加了索引：

create index vote_order_asc on big_table (vote asc, id asc);

和

create index vote_order_desc on big_table (vote desc, id desc);

EXPLAIN 显示上面的 SELECT 查询使用了这些索引，但是无论如何它都非常慢，偏移量很大。

如何优化大 table 中 OFFSET 的查询？也许 PostgreSQL 9.5 甚至更高版本有一些特性？我已经搜索过，但没有找到任何东西。

Answer 1

您尝试过对 table 进行分区吗？

Ease of management, improved scalability and availability, and a reduction in blocking are common reasons to partition tables. Improving query performance is not a reason to employ partitioning, though it can be a beneficial side-effect in some cases. In terms of performance, it is important to ensure that your implementation plan includes a review of query performance. Confirm that your indexes continue to appropriately support your queries after the table is partitioned, and verify that queries using the clustered and nonclustered indexes benefit from partition elimination where applicable.

http://sqlperformance.com/2013/09/sql-indexes/partitioning-benefits

Answer 2

大 OFFSET 总是会很慢。 Postgres 必须对所有行进行排序，并将 visible 行计数到您的偏移量。要跳过所有前面的行直接您可以将索引 row_number 添加到 table （或创建 MATERIALIZED VIEW 包括 row_number) 并使用 WHERE row_number > x 而不是 OFFSET x.

但是，这种方法仅适用于只读（或大部分）数据。对 table 可以同时更改 的数据实施相同的操作 更具挑战性。您需要首先定义所需的行为 exactly.

我建议使用不同的方法分页:

SELECT *
FROM   big_table
WHERE  (vote, id) > (vote_x, id_x)  -- ROW values
ORDER  BY vote, id  -- needs to be deterministic
LIMIT  n;

其中 vote_x 和 id_x 来自 上一页的最后行（对于 DESC 和 ASC）。或者从 first 如果导航 backwards.

您已有的索引支持比较行值 - 该功能符合 ISO SQL 标准，但并非每个 RDBMS 都支持它。

CREATE INDEX vote_order_asc ON big_table (vote, id);

或降序：

SELECT *
FROM   big_table
WHERE  (vote, id) < (vote_x, id_x)  -- ROW values
ORDER  BY vote DESC, id DESC
LIMIT  n;

可以使用相同的索引。
我建议您声明您的列 NOT NULL 或熟悉 NULLS FIRST|LAST 结构：

PostgreSQL sort by datetime asc, null first?

注意两件事：

WHERE 子句中的 ROW 值不能用分隔的成员字段替换。 WHERE (vote, id) > (vote_x, id_x) 不能替换为：
```
<strike>WHERE  vote >= vote_x
AND    id   > id_x</strike>
```
这将排除所有行 id <= id_x，而我们只想对同一次投票而不是下一次投票这样做。正确的翻译应该是：
```
WHERE (vote = vote_x AND id > id_x) OR vote > vote_x
```
...它不能很好地与索引一起使用，并且对于更多的列会变得越来越复杂。

对于单列显然很简单。这就是我一开始提到的特殊情况。
该技术不适用于 ORDER BY 中的混合方向，例如：
```
ORDER  BY vote ASC, id DESC
```
至少我想不出通用方法来有效地实现它。如果两列中至少有一个是数字类型，则可以在 (vote, (id * -1)) 上使用具有倒置值的函数索引 - 并在 ORDER BY:
中使用相同的表达式
```
ORDER  BY vote ASC, (id * -1) ASC
```

在大型 table 上使用 OFFSET 优化查询

Optimize query with OFFSET on large table

sql

postgresql

pagination

sql-order-by

postgresql-9.5