"best fields" 查询的 ElasticSearch 分数未按预期工作

Question

试图了解排名的运作方式。我的索引是在所有字段上用 "english" 分析器定义的。

这是我的查询：

GET test_index_1/study/_search/
{ 
 "query": {

    "multi_match" : {
      "query": "stupid question", 
      "type": "best_fields",
      "fields": ["description", "title",   "questions.text" ]

    }
  }

}

返回结果如下。我的测试索引中只有3个文档。

我想知道为什么第一个文件的分数是第二个文件的两倍。

直观上，"title" 和 "description" 字段是 "equal"：为什么 "title" 中的匹配给出更高的分数？

"hits": {
"total": 3,
"max_score": 1.7600523,
"hits": [
  {
    "_index": "test_index_1",
    "_type": "study",
    "_id": "AV28gnhD1DC3_uN8bTrd",
    "_score": 1.7600523,
    "_source": {
      "title": "stupid question",
      "description": "test test",
      "questions": [
        {
          "text": "stupid text"
        }
      ]
    }
  },
  {
    "_index": "test_index_1",
    "_type": "study",
    "_id": "AV28gomD1DC3_uN8bTre",
    "_score": 0.84339964,
    "_source": {
      "title": "test test",
      "description": "stupid question",
      "questions": [
        {
          "text": "stupid text"
        }
      ]
    }
  },
  {
    "_index": "test_index_1",
    "_type": "study",
    "_id": "AV28gpPT1DC3_uN8bTrf",
    "_score": 0.84339964,
    "_source": {
      "title": "test test",
      "description": "stupid question",
      "questions": [
        {
          "text": "no text"
        }
      ]
    }
  }
]

提前感谢您的任何提示。

Answer 1

Elasticsearch 使用倒排索引和 tfidf。因此，更重要的是在所有文档中出现较少的单词。单词 "stupid" 和 "question" 在所有标题中 只出现一次 （仅在第一个结果中），但它们在 出现两次 所有描述（在第二个和第三个结果中），因此标题中的 "stupid question" 更有价值，因为它出现得更少。这就是为什么第一个文档的分数更大的原因。

"best fields" 查询的 ElasticSearch 分数未按预期工作

ElasticSearch score for "best fields" query does not work as expected

relevance

elasticsearch