我应该创建什么样的索引才能使 "WHERE col1 LIKE '0000%' AND col2 = 'somevalue'" 更快？

Question

我尝试了以下查询，以便使用 PostgreSQL 的 LIKE 运算符在四叉树内进行搜索。在col3列中插入了'0133002112300300320'这样的词，它描述了四叉树的路径。

CREATE TABLE table1
(col1 CHARACTER(9) NOT NULL,
 col2 INTEGER NOT NULL,
 col3 CHARACTER VARYING(64),
 col4 INTEGER NOT NULL,
 col5 DOUBLE PRECISION NOT NULL,
 PRIMARY KEY(col1,col2,col3));

-- Performs sequential search
SELECT col1,col2,col3,col4,col5
FROM table1
WHERE col1='somevalue' AND col2=0 AND col3 LIKE '01330021123003003%';

问题是我设置的 PRIMARY KEY 索引不适用于 WHERE col1='somevalue' AND col2=0 AND col3 LIKE '01330021123003003%'。如果你想使用创建的索引，我似乎不能同时使用 LIKE 运算符和 AND 运算符。

我可以创建任何特殊索引来使 SELECT 更快吗？

Answer 1

It seems that I can't use LIKE operator with AND operator at the same time if you want to use the created index.

在这种情况下可以使用索引。以下是您的精确 table 和精确查询的方式，在 10 万行中随机分布均匀的内容：

insert into table1 select 
   (random()*10000)::int,
   (random()*10000)::int,
    md5(random()::text),
    0,0 
    from generate_series(1,100000);

ANALYZE table1;

EXPLAIN ANALYZE SELECT col1,col2,col3,col4,col5
FROM table1
WHERE col1='somevalue' AND col2=0 AND col3 LIKE '01330021123003003%';

结果：

 Index Scan using table1_pkey on table1  (cost=0.00..8.32 rows=1 width=59) (actual time=0.022..0.022 rows=0 loops=1)
   Index Cond: ((col1 = 'somevalue'::bpchar) AND (col2 = 0))
   Filter: ((col3)::text ~~ '01330021123003003%'::text)
 Total runtime: 0.050 ms
(4 rows)

Index Scan using table1_pkey 表明索引已用于该查询。

如果您的数据集不存在，最可能的原因是您正在搜索过于常见的值。

Answer 2

第一个问题是您没有对文本列使用模式匹配表达式。最好把col3做成text

第二个想法是创建索引的方式。要将索引与模式匹配表达式一起使用，您必须以特殊方式创建索引。看： http://www.postgresql.org/docs/9.1/static/indexes-opclass.html

这里有一个例子：

--Firstly, I generate example data (10m records):
drop table tmp_example_record;
create table tmp_example_record as
select
id,
  floor(random()*4)::text||floor(random()*4)::text||floor(random()*4)::text||floor(random()*4)::text||floor(random()*4)::text||floor(random()*4)::text||floor(random()*4)::text||floor(random()*4)::text||floor(random()*4)::text||floor(random()*4)::text||
    floor(random()*4)::text||floor(random()*4)::text||floor(random()*4)::text||floor(random()*4)::text||floor(random()*4)::text||floor(random()*4)::text||floor(random()*4)::text||floor(random()*4)::text||floor(random()*4)::text||floor(random()*4)::text||
    floor(random()*4)::text||floor(random()*4)::text||floor(random()*4)::text||floor(random()*4)::text||floor(random()*4)::text||floor(random()*4)::text||floor(random()*4)::text||floor(random()*4)::text||floor(random()*4)::text||floor(random()*4)::text||
    floor(random()*4)::text||floor(random()*4)::text||floor(random()*4)::text||floor(random()*4)::text||floor(random()*4)::text||floor(random()*4)::text||floor(random()*4)::text||floor(random()*4)::text||floor(random()*4)::text||floor(random()*4)::text as quad_tree_path
from generate_series(1,10000000) id;


--Create copy of quad_tree_path -> on this column we create right index type to pattern matching
alter table tmp_example_record add column quad_tree_path_copy text;
update tmp_example_record set quad_tree_path_copy =quad_tree_path;  
--create index, with a special operator class
CREATE INDEX tmp_example_record_quad_tree_path_copy_index ON   tmp_example_record (quad_tree_path_copy varchar_pattern_ops);


explain analize
select * from tmp_example_record where quad_tree_path_copy like '212013223122333%'
--about 10ms
/*
"Index Scan using tmp_example_record_quad_tree_path_copy_index on tmp_example_record  (cost=0.56..8.58 rows=1000 width=86)"
 "  Index Cond: ((quad_tree_path_copy ~>=~ '212013223122333'::text) AND (quad_tree_path_copy ~<~ '212013223122334'::text))"
 "  Filter: (quad_tree_path_copy ~~ '212013223122333%'::text)"
 */

explain analize
select * from tmp_example_record where quad_tree_path like '212013223122333%'
--more then  2000ms

我应该创建什么样的索引才能使 "WHERE col1 LIKE '0000%' AND col2 = 'somevalue'" 更快？

What kind of index should I create to make "WHERE col1 LIKE '0000%' AND col2 = 'somevalue'" faster?

postgresql

indexing

database-performance

sql-like