为什么 PostgreSQL 没有正确使用索引?
Why PostgreSQL not using index properly?
架构:
create table records(
id varchar,
updated_at bigint
);
create index index1 on records (updated_at, id);
查询。它遍历最近更新的记录。获取 10 条记录,记住最后一条,然后获取下 10 条,依此类推。
select * from objects
where updated_at > '1' or (updated_at = '1' and id > 'some-id')
order by updated_at, id
limit 10;
它使用了索引,但它没有明智地使用它,还应用了过滤器并处理了大量的记录,请参阅下面查询说明中的Rows Removed by Filter: 31575
。
奇怪的是,如果您删除 or
并保留左侧或右侧条件 - 它对两者都适用。但是,如果同时使用 or
.
两个条件,似乎无法弄清楚如何正确应用索引
Limit (cost=0.42..19.03 rows=20 width=1336) (actual time=542.475..542.501 rows=20 loops=1)
-> Index Scan using index1 on records (cost=0.42..426791.29 rows=458760 width=1336) (actual time=542.473..542.494 rows=20 loops=1)
Filter: ((updated_at > '1'::bigint) OR ((updated_at = '1'::bigint) AND ((id)::text > 'some-id'::text)))
Rows Removed by Filter: 31575
Planning time: 0.180 ms
Execution time: 542.532 ms
(6 rows)
Postgres 版本是 9.6
我会把它作为两个单独的查询来尝试,像这样组合它们的结果:
select *
from
(
select *
from objects
where updated_at > 1
order by updated_at, id
limit 10
union all
select *
from objects
where updated_at = 1
and id > 'some-id'
order by updated_at, id
limit 10
) t
order by updated_at, id
limit 10
我的猜测是,这两个查询都可以很好地优化,并且 运行 都将比当前查询更有效。
如果可能的话,我也会让这些列不为空。
优化了 PostgreSQL 对索引的调用。
For example, given an index on (a, b, c) and a query condition WHERE a
= 5 AND b >= 42 AND c < 77, the index would have to be scanned from the first entry with a = 5 and b = 42 up through the last entry with a
= 5. Index entries with c >= 77 would be skipped, but they'd still have to be scanned through. This index could in principle be used for
queries that have constraints on b and/or c with no constraint on a —
but the entire index would have to be scanned, so in most cases the
planner would prefer a sequential table scan over using the index.
https://www.postgresql.org/docs/9.6/static/indexes-multicolumn.html
架构:
create table records(
id varchar,
updated_at bigint
);
create index index1 on records (updated_at, id);
查询。它遍历最近更新的记录。获取 10 条记录,记住最后一条,然后获取下 10 条,依此类推。
select * from objects
where updated_at > '1' or (updated_at = '1' and id > 'some-id')
order by updated_at, id
limit 10;
它使用了索引,但它没有明智地使用它,还应用了过滤器并处理了大量的记录,请参阅下面查询说明中的Rows Removed by Filter: 31575
。
奇怪的是,如果您删除 or
并保留左侧或右侧条件 - 它对两者都适用。但是,如果同时使用 or
.
Limit (cost=0.42..19.03 rows=20 width=1336) (actual time=542.475..542.501 rows=20 loops=1)
-> Index Scan using index1 on records (cost=0.42..426791.29 rows=458760 width=1336) (actual time=542.473..542.494 rows=20 loops=1)
Filter: ((updated_at > '1'::bigint) OR ((updated_at = '1'::bigint) AND ((id)::text > 'some-id'::text)))
Rows Removed by Filter: 31575
Planning time: 0.180 ms
Execution time: 542.532 ms
(6 rows)
Postgres 版本是 9.6
我会把它作为两个单独的查询来尝试,像这样组合它们的结果:
select *
from
(
select *
from objects
where updated_at > 1
order by updated_at, id
limit 10
union all
select *
from objects
where updated_at = 1
and id > 'some-id'
order by updated_at, id
limit 10
) t
order by updated_at, id
limit 10
我的猜测是,这两个查询都可以很好地优化,并且 运行 都将比当前查询更有效。
如果可能的话,我也会让这些列不为空。
优化了 PostgreSQL 对索引的调用。
For example, given an index on (a, b, c) and a query condition WHERE a = 5 AND b >= 42 AND c < 77, the index would have to be scanned from the first entry with a = 5 and b = 42 up through the last entry with a = 5. Index entries with c >= 77 would be skipped, but they'd still have to be scanned through. This index could in principle be used for queries that have constraints on b and/or c with no constraint on a — but the entire index would have to be scanned, so in most cases the planner would prefer a sequential table scan over using the index.
https://www.postgresql.org/docs/9.6/static/indexes-multicolumn.html