为什么这个 postgresql text_pattern_ops 索引没有在函数体内使用?
Why is this postgresql text_pattern_ops index not used inside of a function body?
我已经使用此代码
创建并填充了 Postgres 9.6 table
create table text_table(id serial primary key , value text);
create index on text_table(lower(value) text_pattern_ops);
insert into text_table(value)
select md5(random()::text)
from generate_series(0, 1000000);
create or replace function search_text_table(term text) returns table(id int) as $$
begin
return query (select text_table.id from text_table where lower(value) like term);
end;
$$ language plpgsql;
-- Query 1
explain analyze select t.id from text_table t where lower(t.value) like 'aba%';
-- Query 2
explain analyze select id from search_text_table('aba%');
第一次查询时,使用value
上的索引来加速查询:
Bitmap Heap Scan on text_table t (cost=216.95..8600.17 rows=5500 width=4) (actual time=0.162..0.798 rows=250 loops=1)
Filter: (lower(value) ~~ 'aba%'::text)
-> Bitmap Index Scan on text_table_lower_idx (cost=0.00..215.57 rows=5500 width=0) (actual time=0.094..0.094 rows=250 loops=1)
Index Cond: ((lower(value) ~>=~ 'aba'::text) AND (lower(value) ~<~ 'abb'::text))
Total runtime: 0.833 ms
但是,当相同的代码作为 search_text_table
函数的一部分执行时,我假设索引未被使用,因为查询需要三个数量级的时间才能到达 运行:
Function Scan on search_text_table (cost=0.25..10.25 rows=1000 width=4) (actual time=985.031..992.106 rows=68625 loops=1)
Total runtime: 994.515 ms
当提供给 like
运算符的术语是函数参数而不是常量字符串时,为什么 Postgres 不使用索引?
我无法重现,但我怀疑你做了类似以下的事情:
CREATE TABLE text_table(
id serial PRIMARY KEY,
value text
);
CREATE INDEX ON text_table(lower(value) text_pattern_ops);
INSERT INTO text_table(value)
SELECT md5(random()::text)
FROM generate_series(0, 1000000);
CREATE FUNCTION search_text_table(term text)
RETURNS TABLE(id int) AS
$$BEGIN
RETURN QUERY (SELECT text_table.id
FROM text_table
WHERE lower(value) LIKE term);
END;$$
LANGUAGE plpgsql;
-- repeat a query like this 5 times
EXPLAIN (ANALYZE) SELECT id FROM search_text_table('%abc%');
EXPLAIN (ANALYZE) SELECT id FROM search_text_table('%abc%');
EXPLAIN (ANALYZE) SELECT id FROM search_text_table('%abc%');
EXPLAIN (ANALYZE) SELECT id FROM search_text_table('%abc%');
EXPLAIN (ANALYZE) SELECT id FROM search_text_table('%abc%');
-- then run a query that could use the index
EXPLAIN (ANALYZE) SELECT id FROM search_text_table('abc%');
让我们看看 PostgreSQL 到底做了什么:
-- requires being superuser
LOAD 'auto_explain';
SET log_min_messages = panic;
SET auto_explain.log_min_duration = 0;
SET auto_explain.log_nested_statements = on;
SET client_min_messages = log;
SET auto_explain.log_analyze = on;
EXPLAIN (ANALYZE) SELECT id FROM search_text_table('abc%');
LOG: duration: 2033.747 ms plan:
Query Text: (select text_table.id from text_table where lower(value) like term)
Seq Scan on text_table (cost=0.00..23334.01 rows=5000 width=4) (actual time=4.374..2033.395 rows=246 loops=1)
Filter: (lower(value) ~~ )
Rows Removed by Filter: 999755
LOG: duration: 2034.259 ms plan:
Query Text: explain analyze select id from search_text_table('abc%');
Function Scan on search_text_table (cost=0.25..10.25 rows=1000 width=4) (actual time=2034.209..2034.240 rows=246 loops=1)
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------
Function Scan on search_text_table (cost=0.25..10.25 rows=1000 width=4) (actual time=2034.209..2034.240 rows=246 loops=1)
Planning time: 0.194 ms
Execution time: 2034.353 ms
(3 rows)
在前5次执行过程中,PostgreSQL会使用自定义计划,即为实际参数值的语句创建计划。
在第六次执行时,它检查在前 5 次执行中选择的计划是否优于 通用计划,该计划不知道参数值。我精心设计了我的示例,所以它不是,所以 PostgreSQL 决定从现在开始使用通用计划。
这意味着无论参数如何,它都将使用顺序扫描。您可以在 EXPLAIN (ANALYZE, BUFFERS)
输出中看到通用计划——注意 </code>.</p>
<hr>
<p>如果你像你展示的那样做实验,会发生不同的事情。</p>
<p>终止 PostgreSQL 连接并启动一个新连接,这样 PostgreSQL 将丢失所有缓存的查询计划。</p>
<p>然后像这样重试:</p>
<pre><code>EXPLAIN (ANALYZE) SELECT id FROM search_text_table('abc%');
EXPLAIN (ANALYZE) SELECT id FROM search_text_table('abc%');
EXPLAIN (ANALYZE) SELECT id FROM search_text_table('abc%');
EXPLAIN (ANALYZE) SELECT id FROM search_text_table('abc%');
EXPLAIN (ANALYZE) SELECT id FROM search_text_table('abc%');
-- requires being superuser
LOAD 'auto_explain';
SET log_min_messages = panic;
SET auto_explain.log_min_duration = 0;
SET auto_explain.log_nested_statements = on;
SET client_min_messages = log;
SET auto_explain.log_analyze = on;
EXPLAIN (ANALYZE) SELECT id FROM search_text_table('abc%');
LOG: duration: 5.123 ms plan:
Query Text: (select text_table.id from text_table where lower(value) like term)
Bitmap Heap Scan on text_table (cost=4.62..70.57 rows=100 width=4) (actual time=0.272..4.889 rows=246 loops=1)
Filter: (lower(value) ~~ 'abc%'::text)
Heap Blocks: exact=242
-> Bitmap Index Scan on text_table_lower_idx (cost=0.00..4.59 rows=17 width=0) (actual time=0.184..0.184 rows=246 loops=1)
Index Cond: ((lower(value) ~>=~ 'abc'::text) AND (lower(value) ~<~ 'abd'::text))
LOG: duration: 6.289 ms plan:
Query Text: EXPLAIN (ANALYZE) SELECT id FROM search_text_table('abc%');
Function Scan on search_text_table (cost=0.25..10.25 rows=1000 width=4) (actual time=6.220..6.264 rows=246 loops=1)
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------
Function Scan on search_text_table (cost=0.25..10.25 rows=1000 width=4) (actual time=6.220..6.264 rows=246 loops=1)
Planning time: 0.055 ms
Execution time: 6.398 ms
(3 rows)
这次前 5 次执行中的自定义计划优于具有顺序扫描的通用计划,因此 PostgreSQL 继续为后续执行使用自定义计划。
从 PostgreSQL v12 开始,您将能够使用参数 plan_cache_mode
.
控制此行为
我已经使用此代码
创建并填充了 Postgres 9.6 tablecreate table text_table(id serial primary key , value text);
create index on text_table(lower(value) text_pattern_ops);
insert into text_table(value)
select md5(random()::text)
from generate_series(0, 1000000);
create or replace function search_text_table(term text) returns table(id int) as $$
begin
return query (select text_table.id from text_table where lower(value) like term);
end;
$$ language plpgsql;
-- Query 1
explain analyze select t.id from text_table t where lower(t.value) like 'aba%';
-- Query 2
explain analyze select id from search_text_table('aba%');
第一次查询时,使用value
上的索引来加速查询:
Bitmap Heap Scan on text_table t (cost=216.95..8600.17 rows=5500 width=4) (actual time=0.162..0.798 rows=250 loops=1)
Filter: (lower(value) ~~ 'aba%'::text)
-> Bitmap Index Scan on text_table_lower_idx (cost=0.00..215.57 rows=5500 width=0) (actual time=0.094..0.094 rows=250 loops=1)
Index Cond: ((lower(value) ~>=~ 'aba'::text) AND (lower(value) ~<~ 'abb'::text))
Total runtime: 0.833 ms
但是,当相同的代码作为 search_text_table
函数的一部分执行时,我假设索引未被使用,因为查询需要三个数量级的时间才能到达 运行:
Function Scan on search_text_table (cost=0.25..10.25 rows=1000 width=4) (actual time=985.031..992.106 rows=68625 loops=1)
Total runtime: 994.515 ms
当提供给 like
运算符的术语是函数参数而不是常量字符串时,为什么 Postgres 不使用索引?
我无法重现,但我怀疑你做了类似以下的事情:
CREATE TABLE text_table(
id serial PRIMARY KEY,
value text
);
CREATE INDEX ON text_table(lower(value) text_pattern_ops);
INSERT INTO text_table(value)
SELECT md5(random()::text)
FROM generate_series(0, 1000000);
CREATE FUNCTION search_text_table(term text)
RETURNS TABLE(id int) AS
$$BEGIN
RETURN QUERY (SELECT text_table.id
FROM text_table
WHERE lower(value) LIKE term);
END;$$
LANGUAGE plpgsql;
-- repeat a query like this 5 times
EXPLAIN (ANALYZE) SELECT id FROM search_text_table('%abc%');
EXPLAIN (ANALYZE) SELECT id FROM search_text_table('%abc%');
EXPLAIN (ANALYZE) SELECT id FROM search_text_table('%abc%');
EXPLAIN (ANALYZE) SELECT id FROM search_text_table('%abc%');
EXPLAIN (ANALYZE) SELECT id FROM search_text_table('%abc%');
-- then run a query that could use the index
EXPLAIN (ANALYZE) SELECT id FROM search_text_table('abc%');
让我们看看 PostgreSQL 到底做了什么:
-- requires being superuser
LOAD 'auto_explain';
SET log_min_messages = panic;
SET auto_explain.log_min_duration = 0;
SET auto_explain.log_nested_statements = on;
SET client_min_messages = log;
SET auto_explain.log_analyze = on;
EXPLAIN (ANALYZE) SELECT id FROM search_text_table('abc%');
LOG: duration: 2033.747 ms plan:
Query Text: (select text_table.id from text_table where lower(value) like term)
Seq Scan on text_table (cost=0.00..23334.01 rows=5000 width=4) (actual time=4.374..2033.395 rows=246 loops=1)
Filter: (lower(value) ~~ )
Rows Removed by Filter: 999755
LOG: duration: 2034.259 ms plan:
Query Text: explain analyze select id from search_text_table('abc%');
Function Scan on search_text_table (cost=0.25..10.25 rows=1000 width=4) (actual time=2034.209..2034.240 rows=246 loops=1)
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------
Function Scan on search_text_table (cost=0.25..10.25 rows=1000 width=4) (actual time=2034.209..2034.240 rows=246 loops=1)
Planning time: 0.194 ms
Execution time: 2034.353 ms
(3 rows)
在前5次执行过程中,PostgreSQL会使用自定义计划,即为实际参数值的语句创建计划。
在第六次执行时,它检查在前 5 次执行中选择的计划是否优于 通用计划,该计划不知道参数值。我精心设计了我的示例,所以它不是,所以 PostgreSQL 决定从现在开始使用通用计划。
这意味着无论参数如何,它都将使用顺序扫描。您可以在 EXPLAIN (ANALYZE, BUFFERS)
输出中看到通用计划——注意 </code>.</p>
<hr>
<p>如果你像你展示的那样做实验,会发生不同的事情。</p>
<p>终止 PostgreSQL 连接并启动一个新连接,这样 PostgreSQL 将丢失所有缓存的查询计划。</p>
<p>然后像这样重试:</p>
<pre><code>EXPLAIN (ANALYZE) SELECT id FROM search_text_table('abc%');
EXPLAIN (ANALYZE) SELECT id FROM search_text_table('abc%');
EXPLAIN (ANALYZE) SELECT id FROM search_text_table('abc%');
EXPLAIN (ANALYZE) SELECT id FROM search_text_table('abc%');
EXPLAIN (ANALYZE) SELECT id FROM search_text_table('abc%');
-- requires being superuser
LOAD 'auto_explain';
SET log_min_messages = panic;
SET auto_explain.log_min_duration = 0;
SET auto_explain.log_nested_statements = on;
SET client_min_messages = log;
SET auto_explain.log_analyze = on;
EXPLAIN (ANALYZE) SELECT id FROM search_text_table('abc%');
LOG: duration: 5.123 ms plan:
Query Text: (select text_table.id from text_table where lower(value) like term)
Bitmap Heap Scan on text_table (cost=4.62..70.57 rows=100 width=4) (actual time=0.272..4.889 rows=246 loops=1)
Filter: (lower(value) ~~ 'abc%'::text)
Heap Blocks: exact=242
-> Bitmap Index Scan on text_table_lower_idx (cost=0.00..4.59 rows=17 width=0) (actual time=0.184..0.184 rows=246 loops=1)
Index Cond: ((lower(value) ~>=~ 'abc'::text) AND (lower(value) ~<~ 'abd'::text))
LOG: duration: 6.289 ms plan:
Query Text: EXPLAIN (ANALYZE) SELECT id FROM search_text_table('abc%');
Function Scan on search_text_table (cost=0.25..10.25 rows=1000 width=4) (actual time=6.220..6.264 rows=246 loops=1)
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------
Function Scan on search_text_table (cost=0.25..10.25 rows=1000 width=4) (actual time=6.220..6.264 rows=246 loops=1)
Planning time: 0.055 ms
Execution time: 6.398 ms
(3 rows)
这次前 5 次执行中的自定义计划优于具有顺序扫描的通用计划,因此 PostgreSQL 继续为后续执行使用自定义计划。
从 PostgreSQL v12 开始,您将能够使用参数 plan_cache_mode
.