使用两个词典创建全文搜索配置
Create full text search configuration with two dictionaries
我想使用 english_stem 字典和简单字典对 postgresql 列执行全文搜索。我可以这样做:
ALTER TEXT SEARCH CONFIGURATION english_simple_conf
ALTER MAPPING FOR asciiword, asciihword, hword_asciipart, word, hword, hword_part
WITH english_stem, simple;
但这会检查该词是否在两个 词典中。有没有办法改变这个配置,使这个词可以与一个字典或另一个字典匹配?
编辑:
我认为没有按顺序检查的原因是因为在搜索应该在简单词典中找到的部分单词时,没有返回任何内容。
select * from ts_debug('english', 'gutter cleaning services');
alias | description | token | dictionaries | dictionary | lexemes
-----------+-----------------+----------+----------------+--------------+----------
asciiword | Word, all ASCII | gutter | {english_stem} | english_stem | {gutter}
blank | Space symbols | | {} | |
asciiword | Word, all ASCII | cleaning | {english_stem} | english_stem | {clean}
blank | Space symbols | | {} | |
asciiword | Word, all ASCII | services | {english_stem} | english_stem | {servic}
select * from ts_debug('simple', 'gutter cleaning services');
alias | description | token | dictionaries | dictionary | lexemes
-----------+-----------------+----------+--------------+------------+------------
asciiword | Word, all ASCII | gutter | {simple} | simple | {gutter}
blank | Space symbols | | {} | |
asciiword | Word, all ASCII | cleaning | {simple} | simple | {cleaning}
blank | Space symbols | | {} | |
asciiword | Word, all ASCII | services | {simple} | simple | {services}
select name from categories where (to_tsvector('english_simple_conf', name) @@ (to_tsquery('english_simple_conf', 'cleani:*')));
name
------
(0 rows)
但在英语词典中搜索部分 returns 正如预期的那样。
select name from categories where (to_tsvector('english_simple_conf', name) @@ (to_tsquery('english_simple_conf', 'clea:*')));
name
--------------------------
Gutter Cleaning Services
But this checks that the word is in both dictionaries.
这是不正确的。 As noted in the docs(见dictionary_name
参数说明),按顺序检查;如果它没有从第一个字典中获得令牌,它只会检查第二个字典。您可以使用 ts_debug()
.
来验证这一点
testdb=# ALTER TEXT SEARCH CONFIGURATION english_simple_conf
ALTER MAPPING FOR asciiword, asciihword, hword_asciipart, word, hword, hword_part
WITH simple;
ALTER TEXT SEARCH CONFIGURATION
testdb=# select * from ts_debug('public.english_simple_conf', 'cars boats n0taword');
alias | description | token | dictionaries | dictionary | lexemes
-----------+--------------------------+----------+--------------+------------+------------
asciiword | Word, all ASCII | cars | {simple} | simple | {cars}
blank | Space symbols | | {} | |
asciiword | Word, all ASCII | boats | {simple} | simple | {boats}
blank | Space symbols | | {} | |
numword | Word, letters and digits | n0taword | {simple} | simple | {n0taword}
(5 rows)
testdb=# ALTER TEXT SEARCH CONFIGURATION english_simple_conf
ALTER MAPPING FOR asciiword, asciihword, hword_asciipart, word, hword, hword_part
WITH english_stem, simple;
ALTER TEXT SEARCH CONFIGURATION
testdb=# select * from ts_debug('public.english_simple_conf', 'cars boats n0taword');
alias | description | token | dictionaries | dictionary | lexemes
-----------+--------------------------+----------+-----------------------+--------------+------------
asciiword | Word, all ASCII | cars | {english_stem,simple} | english_stem | {car}
blank | Space symbols | | {} | |
asciiword | Word, all ASCII | boats | {english_stem,simple} | english_stem | {boat}
blank | Space symbols | | {} | |
numword | Word, letters and digits | n0taword | {simple} | simple | {n0taword}
(5 rows)
最后两个查询不同的原因是 english_stem 词干 'Cleaning' 到 'clean',所以搜索 'cleani*' 将不匹配。尝试将 to_tsvector 和 to_tsquery 表达式添加为一列并将它们从 WHERE 中删除;您会看到“Gutter Cleaning Services”的词干是 'clean':2 'gutter':1 'servic':3
.
testdb=# select to_tsvector('english_simple_conf', name), to_tsquery('english_simple_conf', 'cleani:*'), name from categories;
to_tsvector | to_tsquery | name
---------------------------------+------------+--------------------------
'clean':2 'gutter':1 'servic':3 | 'cleani':* | Gutter Cleaning Services
(1 row)
testdb=# select to_tsvector('english_simple_conf', name), to_tsquery('english_simple_conf', 'cleaning:*'), name from categories;
to_tsvector | to_tsquery | name
---------------------------------+------------+--------------------------
'clean':2 'gutter':1 'servic':3 | 'clean':* | Gutter Cleaning Services
(1 row)
如果您将 ts_query
更改为搜索 cleaning:*
,那也会得到词干并再次匹配。但是,english_stem 无法弄清楚 'cleani' 是指 'clean',除非它也看到 'ng'。因此,这很简单,它不执行任何词干提取,最终导致不匹配 - 在 tsquery 中仍然是尾随 i
,但在 tsvector 中没有。
词干提取不适用于单词的任意前缀,只能用于整个单词;对于前缀匹配,您将使用传统的 left-anchored LIKE.
我想使用 english_stem 字典和简单字典对 postgresql 列执行全文搜索。我可以这样做:
ALTER TEXT SEARCH CONFIGURATION english_simple_conf
ALTER MAPPING FOR asciiword, asciihword, hword_asciipart, word, hword, hword_part
WITH english_stem, simple;
但这会检查该词是否在两个 词典中。有没有办法改变这个配置,使这个词可以与一个字典或另一个字典匹配?
编辑:
我认为没有按顺序检查的原因是因为在搜索应该在简单词典中找到的部分单词时,没有返回任何内容。
select * from ts_debug('english', 'gutter cleaning services');
alias | description | token | dictionaries | dictionary | lexemes
-----------+-----------------+----------+----------------+--------------+----------
asciiword | Word, all ASCII | gutter | {english_stem} | english_stem | {gutter}
blank | Space symbols | | {} | |
asciiword | Word, all ASCII | cleaning | {english_stem} | english_stem | {clean}
blank | Space symbols | | {} | |
asciiword | Word, all ASCII | services | {english_stem} | english_stem | {servic}
select * from ts_debug('simple', 'gutter cleaning services');
alias | description | token | dictionaries | dictionary | lexemes
-----------+-----------------+----------+--------------+------------+------------
asciiword | Word, all ASCII | gutter | {simple} | simple | {gutter}
blank | Space symbols | | {} | |
asciiword | Word, all ASCII | cleaning | {simple} | simple | {cleaning}
blank | Space symbols | | {} | |
asciiword | Word, all ASCII | services | {simple} | simple | {services}
select name from categories where (to_tsvector('english_simple_conf', name) @@ (to_tsquery('english_simple_conf', 'cleani:*')));
name
------
(0 rows)
但在英语词典中搜索部分 returns 正如预期的那样。
select name from categories where (to_tsvector('english_simple_conf', name) @@ (to_tsquery('english_simple_conf', 'clea:*')));
name
--------------------------
Gutter Cleaning Services
But this checks that the word is in both dictionaries.
这是不正确的。 As noted in the docs(见dictionary_name
参数说明),按顺序检查;如果它没有从第一个字典中获得令牌,它只会检查第二个字典。您可以使用 ts_debug()
.
testdb=# ALTER TEXT SEARCH CONFIGURATION english_simple_conf
ALTER MAPPING FOR asciiword, asciihword, hword_asciipart, word, hword, hword_part
WITH simple;
ALTER TEXT SEARCH CONFIGURATION
testdb=# select * from ts_debug('public.english_simple_conf', 'cars boats n0taword');
alias | description | token | dictionaries | dictionary | lexemes
-----------+--------------------------+----------+--------------+------------+------------
asciiword | Word, all ASCII | cars | {simple} | simple | {cars}
blank | Space symbols | | {} | |
asciiword | Word, all ASCII | boats | {simple} | simple | {boats}
blank | Space symbols | | {} | |
numword | Word, letters and digits | n0taword | {simple} | simple | {n0taword}
(5 rows)
testdb=# ALTER TEXT SEARCH CONFIGURATION english_simple_conf
ALTER MAPPING FOR asciiword, asciihword, hword_asciipart, word, hword, hword_part
WITH english_stem, simple;
ALTER TEXT SEARCH CONFIGURATION
testdb=# select * from ts_debug('public.english_simple_conf', 'cars boats n0taword');
alias | description | token | dictionaries | dictionary | lexemes
-----------+--------------------------+----------+-----------------------+--------------+------------
asciiword | Word, all ASCII | cars | {english_stem,simple} | english_stem | {car}
blank | Space symbols | | {} | |
asciiword | Word, all ASCII | boats | {english_stem,simple} | english_stem | {boat}
blank | Space symbols | | {} | |
numword | Word, letters and digits | n0taword | {simple} | simple | {n0taword}
(5 rows)
最后两个查询不同的原因是 english_stem 词干 'Cleaning' 到 'clean',所以搜索 'cleani*' 将不匹配。尝试将 to_tsvector 和 to_tsquery 表达式添加为一列并将它们从 WHERE 中删除;您会看到“Gutter Cleaning Services”的词干是 'clean':2 'gutter':1 'servic':3
.
testdb=# select to_tsvector('english_simple_conf', name), to_tsquery('english_simple_conf', 'cleani:*'), name from categories;
to_tsvector | to_tsquery | name
---------------------------------+------------+--------------------------
'clean':2 'gutter':1 'servic':3 | 'cleani':* | Gutter Cleaning Services
(1 row)
testdb=# select to_tsvector('english_simple_conf', name), to_tsquery('english_simple_conf', 'cleaning:*'), name from categories;
to_tsvector | to_tsquery | name
---------------------------------+------------+--------------------------
'clean':2 'gutter':1 'servic':3 | 'clean':* | Gutter Cleaning Services
(1 row)
如果您将 ts_query
更改为搜索 cleaning:*
,那也会得到词干并再次匹配。但是,english_stem 无法弄清楚 'cleani' 是指 'clean',除非它也看到 'ng'。因此,这很简单,它不执行任何词干提取,最终导致不匹配 - 在 tsquery 中仍然是尾随 i
,但在 tsvector 中没有。
词干提取不适用于单词的任意前缀,只能用于整个单词;对于前缀匹配,您将使用传统的 left-anchored LIKE.