在elasticsearch中配置高亮部分
Configure highlighted part in the elasticsearch
主要问题
用户正在寻找一个名称并输入名称的一部分,比方说 au
,然后找到带有文本 paul
的文档。
我想让文档突出显示 p<em>au</em>l
.
如果我有一个复杂的搜索查询(匹配、前缀、通配符与规则相关性的组合),我该如何实现?
子题
documentation 中 type
、boundary_scanner
和 boundary_chars
的突出显示设置何时起作用?根据我在下面描述的测试,这些设置不会更改突出显示的部分。
尝试 1:使用默认分析器进行通配符查询
PUT myindex
{
"mappings": {
"properties": {
"name": {
"type": "text",
"term_vector": "with_positions_offsets"
}
}
}
}
POST myindex/_doc/1
{
"name": "paul"
}
GET myindex/_search
{
"query": {
"wildcard": {"name": "*au*"}
},
"highlight": {
"fields": {
"name": {}
},
"type": "fvh",
"boundary_scanner": "chars",
"boundary_chars": "abcdefghijklmnopqrstuvwxyz.,!? \t\n"
}
}
这种搜索 returns 突出显示 <em>paul</em>
但我需要得到 p<em>au</em>l
.
尝试 2:使用 NGRAM 分析器匹配查询
这个按照 SO 问题中的描述工作:Highlighting part of word in elasticsearch
PUT myindexngram
{
"settings": {
"analysis": {
"tokenizer": {
"ngram_tokenizer": {
"type": "nGram",
"min_gram": "2",
"max_gram": "3",
"token_chars": [
"letter",
"digit"
]
}
},
"analyzer": {
"index_ngram_analyzer": {
"type": "custom",
"tokenizer": "ngram_tokenizer",
"filter": [
"lowercase"
]
},
"search_term_analyzer": {
"type": "custom",
"tokenizer": "keyword",
"filter": "lowercase"
}
}
}
},
"mappings": {
"properties": {
"name": {
"type": "text",
"analyzer": "index_ngram_analyzer",
"term_vector": "with_positions_offsets"
}
}
}
}
POST myindexngram/_doc/1
{
"name": "paul"
}
GET myindexngram/_search
{
"query": {
"match": {"name": "au"}
},
"highlight": {
"fields": {
"name": {}
}
}
}
这会根据需要突出显示 p<em>au</em>l
,但是:
- 突出显示取决于查询类型,因此组合
match
和 wildcard
将再次导致 <em>paul</em>
.
- 突出显示在
type
、boundary_scanner
和 boundary_chars
设置中完全不受影响。
弹性版本 7.13.4
Elasticsearch 团队的回复:
A highlighter works on terms, so only full terms can be highlighted - whatever are the terms in your index. In your second example, au
could be highlighted, because it it a term in the index, which is not the case for your first example.
There is also an option to define your own highlight_query
that could be different from the main query
, but this could lead to unpredictable highlights.
https://discuss.elastic.co/t/configure-highlighted-part/295164
主要问题
用户正在寻找一个名称并输入名称的一部分,比方说 au
,然后找到带有文本 paul
的文档。
我想让文档突出显示 p<em>au</em>l
.
如果我有一个复杂的搜索查询(匹配、前缀、通配符与规则相关性的组合),我该如何实现?
子题
documentation 中 type
、boundary_scanner
和 boundary_chars
的突出显示设置何时起作用?根据我在下面描述的测试,这些设置不会更改突出显示的部分。
尝试 1:使用默认分析器进行通配符查询
PUT myindex
{
"mappings": {
"properties": {
"name": {
"type": "text",
"term_vector": "with_positions_offsets"
}
}
}
}
POST myindex/_doc/1
{
"name": "paul"
}
GET myindex/_search
{
"query": {
"wildcard": {"name": "*au*"}
},
"highlight": {
"fields": {
"name": {}
},
"type": "fvh",
"boundary_scanner": "chars",
"boundary_chars": "abcdefghijklmnopqrstuvwxyz.,!? \t\n"
}
}
这种搜索 returns 突出显示 <em>paul</em>
但我需要得到 p<em>au</em>l
.
尝试 2:使用 NGRAM 分析器匹配查询
这个按照 SO 问题中的描述工作:Highlighting part of word in elasticsearch
PUT myindexngram
{
"settings": {
"analysis": {
"tokenizer": {
"ngram_tokenizer": {
"type": "nGram",
"min_gram": "2",
"max_gram": "3",
"token_chars": [
"letter",
"digit"
]
}
},
"analyzer": {
"index_ngram_analyzer": {
"type": "custom",
"tokenizer": "ngram_tokenizer",
"filter": [
"lowercase"
]
},
"search_term_analyzer": {
"type": "custom",
"tokenizer": "keyword",
"filter": "lowercase"
}
}
}
},
"mappings": {
"properties": {
"name": {
"type": "text",
"analyzer": "index_ngram_analyzer",
"term_vector": "with_positions_offsets"
}
}
}
}
POST myindexngram/_doc/1
{
"name": "paul"
}
GET myindexngram/_search
{
"query": {
"match": {"name": "au"}
},
"highlight": {
"fields": {
"name": {}
}
}
}
这会根据需要突出显示 p<em>au</em>l
,但是:
- 突出显示取决于查询类型,因此组合
match
和wildcard
将再次导致<em>paul</em>
. - 突出显示在
type
、boundary_scanner
和boundary_chars
设置中完全不受影响。
弹性版本 7.13.4
Elasticsearch 团队的回复:
A highlighter works on terms, so only full terms can be highlighted - whatever are the terms in your index. In your second example,
au
could be highlighted, because it it a term in the index, which is not the case for your first example. There is also an option to define your ownhighlight_query
that could be different from the mainquery
, but this could lead to unpredictable highlights.
https://discuss.elastic.co/t/configure-highlighted-part/295164