卦和 ILIKE 同时
trigram and ILIKE simultaneously
我有 GIN 索引列,索引使用 gin_trgm_ops
。
我正在使用相似性搜索字词 mad
:
我得到:
god-made
made
man
man-made
may
但它漏掉了一些词,例如 srimad
。
我想要 select 前 5 个 ILIKE '%mad%'
或 'mad%'
然后还有前 5 个八卦并合并结果。
实施解决方案后:
我的 SQL 查询和解释:
EXPLAIN (COSTS OFF)
(SELECT word_similarity('mad',word), word FROM articles_words WHERE word ILIKE '%mad%' ORDER BY word_similarity('mad',word) DESC LIMIT 10)
UNION
(SELECT word_similarity('mad',word),word FROM articles_words WHERE word_similarity('mad',word) > 0.4 ORDER BY word_similarity('mad',word) DESC, word LIMIT 10)
"QUERY PLAN"
"HashAggregate"
" Group Key: (word_similarity('mad'::text, articles_words.word)), articles_words.word"
" -> Append"
" -> Limit"
" -> Sort"
" Sort Key: (word_similarity('mad'::text, articles_words.word)) DESC"
" -> Bitmap Heap Scan on articles_words"
" Recheck Cond: (word ~~* '%mad%'::text)"
" -> Bitmap Index Scan on words_idx"
" Index Cond: (word ~~* '%mad%'::text)"
" -> Limit"
" -> Sort"
" Sort Key: (word_similarity('mad'::text, articles_words_1.word)) DESC, articles_words_1.word"
" -> Seq Scan on articles_words articles_words_1"
" Filter: (word_similarity('mad'::text, word) > '0.40000000000000002'::double precision)"
还有关于 UNION 的问题:
第一个查询项目:
(SELECT word_similarity('mad',word), word FROM articles_words WHERE word ILIKE '%mad%' ORDER BY word_similarity('mad',word) DESC LIMIT 10)
0.75 man-made
0.75 made
0.75 god-made
0.5 srimad-bhagavatam
0.5 srimad
第二个查询项目:
(SELECT word_similarity('mad',word),word FROM articles_words WHERE word_similarity('mad',word) > 0.4 ORDER BY word_similarity('mad',word) DESC, word LIMIT 10)
0.75 god-made
0.75 made
0.75 man-made
0.5 anti-material
0.5 half-man
0.5 magistrate
0.5 maha
0.5 maha-mantra
0.5 mahaprabhu
0.5 maharaja
我想要结果为:
0.75 man-made
0.75 made
0.75 god-made
0.5 srimad-bhagavatam
0.5 srimad
0.5 anti-material
0.5 half-man
0.5 magistrate
0.5 maha
0.5 maha-mantra
0.5 mahaprabhu
0.5 maharaja
但我得到的顺序如下:
0.75 god-made
0.5 maha
0.5 anti-material
0.5 mahaprabhu
0.5 maharaja
0.5 srimad
0.5 half-man
0.5 magistrate
0.5 srimad-bhagavatam
0.75 made
0.75 man-made
0.5 maha-mantra
您应该改用 GiST 索引。
具有以下table:
test=> TABLE trigram;
id | val
----+----------
1 | god-made
2 | made
3 | man
5 | man-made
4 | may
6 | srimad
...
您可以这样创建索引:
CREATE INDEX ON trigram USING gist (val gist_trgm_ops);
它可以在这样的查询中使用:
EXPLAIN (COSTS off)
(SELECT id, val
FROM trigram
WHERE val ILIKE '%mad%'
LIMIT 5)
UNION
(SELECT id, val
FROM trigram
ORDER BY val <-> 'mad'
LIMIT 5);
QUERY PLAN
-------------------------------------------------------------------------------
HashAggregate
Group Key: trigram.id, trigram.val
-> Append
-> Limit
-> Index Scan using trigram_val_idx on trigram
Index Cond: (val ~~* '%mad%'::text)
-> Subquery Scan on "*SELECT* 2"
-> Limit
-> Index Scan using trigram_val_idx on trigram trigram_1
Order By: (val <-> 'mad'::text)
(10 rows)
我有 GIN 索引列,索引使用 gin_trgm_ops
。
我正在使用相似性搜索字词 mad
:
我得到:
god-made
made
man
man-made
may
但它漏掉了一些词,例如 srimad
。
我想要 select 前 5 个 ILIKE '%mad%'
或 'mad%'
然后还有前 5 个八卦并合并结果。
实施解决方案后:
我的 SQL 查询和解释:
EXPLAIN (COSTS OFF)
(SELECT word_similarity('mad',word), word FROM articles_words WHERE word ILIKE '%mad%' ORDER BY word_similarity('mad',word) DESC LIMIT 10)
UNION
(SELECT word_similarity('mad',word),word FROM articles_words WHERE word_similarity('mad',word) > 0.4 ORDER BY word_similarity('mad',word) DESC, word LIMIT 10)
"QUERY PLAN"
"HashAggregate"
" Group Key: (word_similarity('mad'::text, articles_words.word)), articles_words.word"
" -> Append"
" -> Limit"
" -> Sort"
" Sort Key: (word_similarity('mad'::text, articles_words.word)) DESC"
" -> Bitmap Heap Scan on articles_words"
" Recheck Cond: (word ~~* '%mad%'::text)"
" -> Bitmap Index Scan on words_idx"
" Index Cond: (word ~~* '%mad%'::text)"
" -> Limit"
" -> Sort"
" Sort Key: (word_similarity('mad'::text, articles_words_1.word)) DESC, articles_words_1.word"
" -> Seq Scan on articles_words articles_words_1"
" Filter: (word_similarity('mad'::text, word) > '0.40000000000000002'::double precision)"
还有关于 UNION 的问题:
第一个查询项目:
(SELECT word_similarity('mad',word), word FROM articles_words WHERE word ILIKE '%mad%' ORDER BY word_similarity('mad',word) DESC LIMIT 10)
0.75 man-made
0.75 made
0.75 god-made
0.5 srimad-bhagavatam
0.5 srimad
第二个查询项目:
(SELECT word_similarity('mad',word),word FROM articles_words WHERE word_similarity('mad',word) > 0.4 ORDER BY word_similarity('mad',word) DESC, word LIMIT 10)
0.75 god-made
0.75 made
0.75 man-made
0.5 anti-material
0.5 half-man
0.5 magistrate
0.5 maha
0.5 maha-mantra
0.5 mahaprabhu
0.5 maharaja
我想要结果为:
0.75 man-made
0.75 made
0.75 god-made
0.5 srimad-bhagavatam
0.5 srimad
0.5 anti-material
0.5 half-man
0.5 magistrate
0.5 maha
0.5 maha-mantra
0.5 mahaprabhu
0.5 maharaja
但我得到的顺序如下:
0.75 god-made
0.5 maha
0.5 anti-material
0.5 mahaprabhu
0.5 maharaja
0.5 srimad
0.5 half-man
0.5 magistrate
0.5 srimad-bhagavatam
0.75 made
0.75 man-made
0.5 maha-mantra
您应该改用 GiST 索引。
具有以下table:
test=> TABLE trigram;
id | val
----+----------
1 | god-made
2 | made
3 | man
5 | man-made
4 | may
6 | srimad
...
您可以这样创建索引:
CREATE INDEX ON trigram USING gist (val gist_trgm_ops);
它可以在这样的查询中使用:
EXPLAIN (COSTS off)
(SELECT id, val
FROM trigram
WHERE val ILIKE '%mad%'
LIMIT 5)
UNION
(SELECT id, val
FROM trigram
ORDER BY val <-> 'mad'
LIMIT 5);
QUERY PLAN
-------------------------------------------------------------------------------
HashAggregate
Group Key: trigram.id, trigram.val
-> Append
-> Limit
-> Index Scan using trigram_val_idx on trigram
Index Cond: (val ~~* '%mad%'::text)
-> Subquery Scan on "*SELECT* 2"
-> Limit
-> Index Scan using trigram_val_idx on trigram trigram_1
Order By: (val <-> 'mad'::text)
(10 rows)