使一个完整的单词比 Edge NGram 子集有更多的分数

Question

我正在尝试在全名匹配的文档上获得更高的分数，而不是具有相同值的 Edge NGram 子集。

所以结果是：

Pos Name              _score       _id

1   Baritone horn     7.56878     1786
2   Baritone ukulele  7.56878     2313
3   Bari              7.56878     2360
4   Baritone voice    7.56878     1787

我本来打算第三个 ("Bari") 会有更高的分数，因为它是全名，但是，由于边缘 ngram 分解将使所有其他人都具有 "bari" 单词索引。那么您是否可以在结果中看到 table，所有人的分数都相等，我什至不知道 elasticsearch 是如何排序的，因为 _id 甚至不是连续的，也不是排序的名称。

我怎样才能做到这一点？

谢谢

示例'code'

设置

{
  "analysis": {
    "filter": {
      "edgeNGram_filter": {
        "type": "edgeNGram",
        "min_gram": 3,
        "max_gram": 20,
        "token_chars": [
          "letter",
          "digit",
          "punctuation",
          "symbol"
        ]
      }
    },
    "analyzer": {
      "edgeNGram_analyzer": {
        "type": "custom",
        "tokenizer": "whitespace",
        "filter": [
          "lowercase",
          "asciifolding",
          "edgeNGram_filter"
        ]
      },
      "whitespace_analyzer": {
        "type": "custom",
        "tokenizer": "whitespace",
        "filter": [
          "lowercase",
          "asciifolding"
        ]
      }
    }
  }
}

source

映射：

{
  "name": {
    "type": "string",
    "index": "not_analyzed"
  },
  "suggest": {
    "type": "completion",
    "index_analyzer": "nGram_analyzer",
    "search_analyzer": "whitespace_analyzer",
    "payloads": true
  }
}

查询：

POST /attribute-tree/attribute/_search
{
  "query": {
    "match": {
      "suggest": "Bari"
    }
  }
}

结果：

(只留下相关数据)

{
  "took": 3,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 4,
    "max_score": 7.56878,
    "hits": [
      {
        "_index": "attribute-tree",
        "_type": "attribute",
        "_id": "1786",
        "_score": 7.56878,
        "_source": {
          "name": "Baritone horn",
          "suggest": {
            "input": [
              "Baritone",
              "horn"
            ],
            "output": "Baritone horn"
          }
        }
      },
      {
        "_index": "attribute-tree",
        "_type": "attribute",
        "_id": "2313",
        "_score": 7.56878,
        "_source": {
          "name": "Baritone ukulele",
          "suggest": {
            "input": [
              "Baritone",
              "ukulele"
            ],
            "output": "Baritone ukulele"
          }
        }
      },
      {
        "_index": "attribute-tree",
        "_type": "attribute",
        "_id": "2360",
        "_score": 7.56878,
        "_source": {
          "name": "Bari",
          "suggest": {
            "input": [
              "Bari"
            ],
            "output": "Bari"
          }
        }
      },
      {
        "_index": "attribute-tree",
        "_type": "attribute",
        "_id": "1787",
        "_score": 7.568078,
        "_source": {
          "name": "Baritone voice",
          "suggest": {
            "input": [
              "Baritone",
              "voice"
            ],
            "output": "Baritone voice"
          }
        }
      }
    ]
  }
}

Answer 1

您可以使用 bool 查询运算符及其 should 子句为精确匹配添加分数，如下所示：

POST /attribute-tree/attribute/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "suggest": "Bari"
          }
        }
      ],
      "should": [
        {
          "match": {
            "name": "Bari"
          }
        }
      ]
    }
  }
}

should 子句中的查询在 ElasticSearch definitive guide 中称为 signal 子句，这就是区分完美匹配和 ngram 匹配的方式。您将拥有与 must 子句匹配的所有文档，但是由于 bool 查询评分公式，匹配 should 查询的文档将获得更高的分数：

score = ("must" queries total score + matching "should" queries total score) / (total number of "must" queries and "should" queries)

结果如你所愿，巴里第一（得分遥遥领先:)）:

"hits": {
      "total": 3,
      "max_score": 0.4339554,
      "hits": [
         {
            "_index": "attribute-tree",
            "_type": "attribute",
            "_id": "2360",
            "_score": 0.4339554,
            "_source": {
               "name": "Bari",
               "suggest": {
                  "input": [
                     "Bari"
                  ],
                  "output": "Bari"
               }
            }
         },
         {
            "_index": "attribute-tree",
            "_type": "attribute",
            "_id": "1786",
            "_score": 0.04500804,
            "_source": {
               "name": "Baritone horn",
               "suggest": {
                  "input": [
                     "Baritone",
                     "horn"
                  ],
                  "output": "Baritone horn"
               }
            }
         },
         {
            "_index": "attribute-tree",
            "_type": "attribute",
            "_id": "2313",
            "_score": 0.04500804,
            "_source": {
               "name": "Baritone ukulele",
               "suggest": {
                  "input": [
                     "Baritone",
                     "ukulele"
                  ],
                  "output": "Baritone ukulele"
               }
            }
         }
      ]

使一个完整的单词比 Edge NGram 子集有更多的分数

Make a full word have more score than a Edge NGram subset

lucene

n-gram

elasticsearch

示例'code'

设置

映射：

查询：

结果：