
Configure highlighted part in the elasticsearch

用户正在寻找一个名称并输入名称的一部分,比方说 au,然后找到带有文本 paul 的文档。 我想让文档突出显示 p<em>au</em>l.

documentationtypeboundary_scannerboundary_chars 的突出显示设置何时起作用?根据我在下面描述的测试,这些设置不会更改突出显示的部分。

尝试 1:使用默认分析器进行通配符查询

PUT myindex
    "mappings": {
        "properties": {
            "name": {
                "type": "text",
                "term_vector": "with_positions_offsets"
POST myindex/_doc/1
    "name": "paul"
GET myindex/_search
    "query": {
        "wildcard": {"name": "*au*"}
    "highlight": {
        "fields": { 
            "name": {}
        "type": "fvh",
        "boundary_scanner": "chars",
        "boundary_chars": "abcdefghijklmnopqrstuvwxyz.,!? \t\n"

这种搜索 returns 突出显示 <em>paul</em> 但我需要得到 p<em>au</em>l.

尝试 2:使用 NGRAM 分析器匹配查询
这个按照 SO 问题中的描述工作:Highlighting part of word in elasticsearch

PUT myindexngram
    "settings": {
        "analysis": {
            "tokenizer": {
                "ngram_tokenizer": {
                    "type": "nGram",
                    "min_gram": "2",
                    "max_gram": "3",
                    "token_chars": [
            "analyzer": {
                "index_ngram_analyzer": {
                    "type": "custom",
                    "tokenizer": "ngram_tokenizer",
                    "filter": [
                "search_term_analyzer": {
                    "type": "custom",
                    "tokenizer": "keyword",
                    "filter": "lowercase"
    "mappings": {
        "properties": {
            "name": {
                "type": "text",
                "analyzer": "index_ngram_analyzer",
                "term_vector": "with_positions_offsets"
POST myindexngram/_doc/1
    "name": "paul"
GET myindexngram/_search
    "query": {
        "match": {"name": "au"}
    "highlight": {
        "fields": { 
            "name": {}

这会根据需要突出显示 p<em>au</em>l,但是:

  1. 突出显示取决于查询类型,因此组合 matchwildcard 将再次导致 <em>paul</em>.
  2. 突出显示在 typeboundary_scannerboundary_chars 设置中完全不受影响。

弹性版本 7.13.4

Elasticsearch 团队的回复:

A highlighter works on terms, so only full terms can be highlighted - whatever are the terms in your index. In your second example, au could be highlighted, because it it a term in the index, which is not the case for your first example. There is also an option to define your own highlight_query that could be different from the main query, but this could lead to unpredictable highlights.
