部分索引未生效
Partial Indexing not taking effect
当我在 \d+
命令中看到部分索引时,为什么我得到 Seq scan
;
\d+ call_records;
id | integer | not null default nextval('call_records_id_seq'::regclass) | plain | |
plain_crn | bigint |
active | boolean | default true
timestamp | bigint | default 0
Indexes:
"index_call_records_on_plain_crn" UNIQUE, btree (plain_crn)
"index_call_records_on_active" btree (active) WHERE active = true
与 id
预期的一样,是索引扫描。
EXPLAIN select * from call_records where id=1;
QUERY PLAN
----------------------------------------------------------------------------------------
Index Scan using call_records_pkey on call_records (cost=0.14..8.16 rows=1 width=373)
Index Cond: (id = 1)
(2 rows)
plain_crn
也一样
EXPLAIN select * from call_records where plain_crn=1;
QUERY PLAN
------------------------------------------------------------------------------------------------------
Index Scan using index_call_records_on_plain_crn on call_records (cost=0.14..8.16 rows=1 width=373)
Index Cond: (plain_crn = 1)
(2 rows)
但是,在 active
的情况下就不一样了。
EXPLAIN select * from call_records where active=true; QUERY PLAN
-----------------------------------------------------------------
Seq Scan on call_records (cost=0.00..12.00 rows=100 width=373)
Filter: active
(2 rows)
您应该从测试索引扫描的成本开始
SET enable_seqscan = OFF;
你会发现它比 seqscan 高很多。您的 table 中的总行数可能非常低。由于您选择 *
Postgres 仍然必须查找每一行,因此对所有行进行顺序扫描比检查索引然后必须获取大部分页面要容易得多。
PostgreSQL是否使用"active"上的索引取决于真假的比例。在某些时候真多于假,查询规划器将决定 table 扫描可能会更快。
我构建了一个 table 来测试,并加载了一百万行随机数据。
select active, count(*)
from call_records
group by active;
active count
--
f 499983
t 500017
True 和 false 的行数大致相同。这是执行计划。
explain analyze
select * from call_records where active=true;
"Bitmap Heap Scan on call_records (cost=5484.82..15344.49 rows=500567 width=21) (actual time=56.542..172.084 rows=500017 loops=1)"
" Filter: active"
" Heap Blocks: exact=7354"
" -> Bitmap Index Scan on call_records_active_idx (cost=0.00..5359.67 rows=250567 width=0) (actual time=55.040..55.040 rows=500023 loops=1)"
" Index Cond: (active = true)"
"Planning time: 0.105 ms"
"Execution time: 204.209 ms"
然后我更新了"active",更新了统计数据,再次检查。
update call_records
set active = true
where id < 750000;
analyze call_records;
explain analyze
select * from call_records where active=true;
"Seq Scan on call_records (cost=0.00..22868.00 rows=874100 width=21) (actual time=0.032..280.506 rows=874780 loops=1)"
" Filter: active"
" Rows Removed by Filter: 125220"
"Planning time: 0.316 ms"
"Execution time: 337.400 ms"
关闭顺序扫描表明,就我而言,PostgreSQL 做出了正确的决定。 table 扫描(顺序扫描)快了大约 10 毫秒。
set enable_seqscan = off;
explain analyze
select * from call_records where active=true;
"Index Scan using call_records_active_idx on call_records (cost=0.42..39071.14 rows=874100 width=21) (actual time=0.031..293.295 rows=874780 loops=1)"
" Index Cond: (active = true)"
"Planning time: 0.343 ms"
"Execution time: 349.403 ms"
当我在 \d+
命令中看到部分索引时,为什么我得到 Seq scan
;
\d+ call_records;
id | integer | not null default nextval('call_records_id_seq'::regclass) | plain | |
plain_crn | bigint |
active | boolean | default true
timestamp | bigint | default 0
Indexes:
"index_call_records_on_plain_crn" UNIQUE, btree (plain_crn)
"index_call_records_on_active" btree (active) WHERE active = true
与 id
预期的一样,是索引扫描。
EXPLAIN select * from call_records where id=1;
QUERY PLAN
----------------------------------------------------------------------------------------
Index Scan using call_records_pkey on call_records (cost=0.14..8.16 rows=1 width=373)
Index Cond: (id = 1)
(2 rows)
plain_crn
也一样EXPLAIN select * from call_records where plain_crn=1;
QUERY PLAN
------------------------------------------------------------------------------------------------------
Index Scan using index_call_records_on_plain_crn on call_records (cost=0.14..8.16 rows=1 width=373)
Index Cond: (plain_crn = 1)
(2 rows)
但是,在 active
的情况下就不一样了。
EXPLAIN select * from call_records where active=true; QUERY PLAN
-----------------------------------------------------------------
Seq Scan on call_records (cost=0.00..12.00 rows=100 width=373)
Filter: active
(2 rows)
您应该从测试索引扫描的成本开始
SET enable_seqscan = OFF;
你会发现它比 seqscan 高很多。您的 table 中的总行数可能非常低。由于您选择 *
Postgres 仍然必须查找每一行,因此对所有行进行顺序扫描比检查索引然后必须获取大部分页面要容易得多。
PostgreSQL是否使用"active"上的索引取决于真假的比例。在某些时候真多于假,查询规划器将决定 table 扫描可能会更快。
我构建了一个 table 来测试,并加载了一百万行随机数据。
select active, count(*)
from call_records
group by active;
active count -- f 499983 t 500017
True 和 false 的行数大致相同。这是执行计划。
explain analyze
select * from call_records where active=true;
"Bitmap Heap Scan on call_records (cost=5484.82..15344.49 rows=500567 width=21) (actual time=56.542..172.084 rows=500017 loops=1)" " Filter: active" " Heap Blocks: exact=7354" " -> Bitmap Index Scan on call_records_active_idx (cost=0.00..5359.67 rows=250567 width=0) (actual time=55.040..55.040 rows=500023 loops=1)" " Index Cond: (active = true)" "Planning time: 0.105 ms" "Execution time: 204.209 ms"
然后我更新了"active",更新了统计数据,再次检查。
update call_records
set active = true
where id < 750000;
analyze call_records;
explain analyze
select * from call_records where active=true;
"Seq Scan on call_records (cost=0.00..22868.00 rows=874100 width=21) (actual time=0.032..280.506 rows=874780 loops=1)" " Filter: active" " Rows Removed by Filter: 125220" "Planning time: 0.316 ms" "Execution time: 337.400 ms"
关闭顺序扫描表明,就我而言,PostgreSQL 做出了正确的决定。 table 扫描(顺序扫描)快了大约 10 毫秒。
set enable_seqscan = off;
explain analyze
select * from call_records where active=true;
"Index Scan using call_records_active_idx on call_records (cost=0.42..39071.14 rows=874100 width=21) (actual time=0.031..293.295 rows=874780 loops=1)" " Index Cond: (active = true)" "Planning time: 0.343 ms" "Execution time: 349.403 ms"