在elasticsearch中配置高亮部分

Question

主要问题
用户正在寻找一个名称并输入名称的一部分，比方说 au，然后找到带有文本 paul 的文档。我想让文档突出显示 paul.
如果我有一个复杂的搜索查询（匹配、前缀、通配符与规则相关性的组合），我该如何实现？

子题
documentation 中 type、boundary_scanner 和 boundary_chars 的突出显示设置何时起作用？根据我在下面描述的测试，这些设置不会更改突出显示的部分。

尝试 1：使用默认分析器进行通配符查询

PUT myindex
{
    "mappings": {
        "properties": {
            "name": {
                "type": "text",
                "term_vector": "with_positions_offsets"
            }
        }
    }
}

POST myindex/_doc/1
{
    "name": "paul"
}

GET myindex/_search
{
    "query": {
        "wildcard": {"name": "*au*"}
    },
    "highlight": {
        "fields": { 
            "name": {}
        },
        "type": "fvh",
        "boundary_scanner": "chars",
        "boundary_chars": "abcdefghijklmnopqrstuvwxyz.,!? \t\n"
    }
}

这种搜索 returns 突出显示 paul 但我需要得到 paul.

尝试 2：使用 NGRAM 分析器匹配查询
这个按照 SO 问题中的描述工作：Highlighting part of word in elasticsearch

PUT myindexngram
{
    "settings": {
        "analysis": {
            "tokenizer": {
                "ngram_tokenizer": {
                    "type": "nGram",
                    "min_gram": "2",
                    "max_gram": "3",
                    "token_chars": [
                        "letter",
                        "digit"
                    ]
                }
            },
            "analyzer": {
                "index_ngram_analyzer": {
                    "type": "custom",
                    "tokenizer": "ngram_tokenizer",
                    "filter": [
                        "lowercase"
                    ]
                },
                "search_term_analyzer": {
                    "type": "custom",
                    "tokenizer": "keyword",
                    "filter": "lowercase"
                }
            }
        }
    },
    "mappings": {
        "properties": {
            "name": {
                "type": "text",
                "analyzer": "index_ngram_analyzer",
                "term_vector": "with_positions_offsets"
            }
        }
    }
}

POST myindexngram/_doc/1
{
    "name": "paul"
}

GET myindexngram/_search
{
    "query": {
        "match": {"name": "au"}
    },
    "highlight": {
        "fields": { 
            "name": {}
        }
    }
}

这会根据需要突出显示 paul，但是：

突出显示取决于查询类型，因此组合 match 和 wildcard 将再次导致 paul.
突出显示在 type、boundary_scanner 和 boundary_chars 设置中完全不受影响。

弹性版本 7.13.4

Answer 1

Elasticsearch 团队的回复：

A highlighter works on terms, so only full terms can be highlighted - whatever are the terms in your index. In your second example, au could be highlighted, because it it a term in the index, which is not the case for your first example. There is also an option to define your own highlight_query that could be different from the main query, but this could lead to unpredictable highlights.

https://discuss.elastic.co/t/configure-highlighted-part/295164

在elasticsearch中配置高亮部分

Configure highlighted part in the elasticsearch

highlight

elasticsearch