在elasticsearch中配置高亮部分

Configure highlighted part in the elasticsearch

主要问题
用户正在寻找一个名称并输入名称的一部分,比方说 au,然后找到带有文本 paul 的文档。 我想让文档突出显示 p<em>au</em>l.
如果我有一个复杂的搜索查询(匹配、前缀、通配符与规则相关性的组合),我该如何实现?

子题
documentationtypeboundary_scannerboundary_chars 的突出显示设置何时起作用?根据我在下面描述的测试,这些设置不会更改突出显示的部分。

尝试 1:使用默认分析器进行通配符查询

PUT myindex
{
    "mappings": {
        "properties": {
            "name": {
                "type": "text",
                "term_vector": "with_positions_offsets"
            }
        }
    }
}
POST myindex/_doc/1
{
    "name": "paul"
}
GET myindex/_search
{
    "query": {
        "wildcard": {"name": "*au*"}
    },
    "highlight": {
        "fields": { 
            "name": {}
        },
        "type": "fvh",
        "boundary_scanner": "chars",
        "boundary_chars": "abcdefghijklmnopqrstuvwxyz.,!? \t\n"
    }
}

这种搜索 returns 突出显示 <em>paul</em> 但我需要得到 p<em>au</em>l.

尝试 2:使用 NGRAM 分析器匹配查询
这个按照 SO 问题中的描述工作:Highlighting part of word in elasticsearch

PUT myindexngram
{
    "settings": {
        "analysis": {
            "tokenizer": {
                "ngram_tokenizer": {
                    "type": "nGram",
                    "min_gram": "2",
                    "max_gram": "3",
                    "token_chars": [
                        "letter",
                        "digit"
                    ]
                }
            },
            "analyzer": {
                "index_ngram_analyzer": {
                    "type": "custom",
                    "tokenizer": "ngram_tokenizer",
                    "filter": [
                        "lowercase"
                    ]
                },
                "search_term_analyzer": {
                    "type": "custom",
                    "tokenizer": "keyword",
                    "filter": "lowercase"
                }
            }
        }
    },
    "mappings": {
        "properties": {
            "name": {
                "type": "text",
                "analyzer": "index_ngram_analyzer",
                "term_vector": "with_positions_offsets"
            }
        }
    }
}
POST myindexngram/_doc/1
{
    "name": "paul"
}
GET myindexngram/_search
{
    "query": {
        "match": {"name": "au"}
    },
    "highlight": {
        "fields": { 
            "name": {}
        }
    }
}

这会根据需要突出显示 p<em>au</em>l,但是:

  1. 突出显示取决于查询类型,因此组合 matchwildcard 将再次导致 <em>paul</em>.
  2. 突出显示在 typeboundary_scannerboundary_chars 设置中完全不受影响。

弹性版本 7.13.4

Elasticsearch 团队的回复:

A highlighter works on terms, so only full terms can be highlighted - whatever are the terms in your index. In your second example, au could be highlighted, because it it a term in the index, which is not the case for your first example. There is also an option to define your own highlight_query that could be different from the main query, but this could lead to unpredictable highlights.

https://discuss.elastic.co/t/configure-highlighted-part/295164