Elasticsearch 应该有不同的分数

Question

我正在通过过滤和使用 bool 查询应用分数来检索文档。例如：

{
  "query": {
    "bool": {
      "should": [
        {
          "term": {
            "color": "Yellow"
          }
        },
        {
          "term": {
            "color": "Red"
          }
        },

        {
          "term": {
            "color": "Blue"
          }
        }
      ],
      "minimum_should_match": 1
    }
  }
}

如果数据只有“黄色”，它给我的分数是“1.5”，但如果数据只有“红色”，它给我的分数是“1.4”。我希望分数相同。每个数据只有 1 个匹配项，为什么分数不同？ should query中有什么可以忽略词条顺序的吗？当我只有一场比赛时，“黄色”比赛将始终获得高分...

更新：问题不在于应该数组中的术语顺序，而在于“包含该术语的文档数量”

Answer 1

如果得分对您不重要，您可以将 filter 子句与 bool/should 子句一起使用

过滤上下文避开了评分部分，是一个普通的yes/no查询。因此匹配文档的分数将始终为 0.0

{
  "query": {
    "bool": {
      "filter": {
        "bool": {
          "should": [
            {
              "term": {
                "color.keyword": "Yellow"
              }
            },
            {
              "term": {
                "color.keyword": "Black"
              }
            },
            {
              "term": {
                "color.keyword": "Purple"
              }
            }
          ],
          "minimum_should_match": 1
        }
      }
    }
  }
}

匹配文档的得分取决于几个因素，如字段长度、术语频率、文档总数等

您可以通过explain API

了解更多分数的计算方式

GET /_search?explain=true

Answer 2

@ESCoder 使用上面的示例我有：

“黄色”

{
                      "value" : 1.5995531,
                      "description" : "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:",
                      "details" : [
                        {
                          "value" : 30,
                          "description" : "n, number of documents containing term",
                          "details" : [ ]
                        },
                        {
                          "value" : 150,
                          "description" : "N, total number of documents with field",
                          "details" : [ ]
                        }
                      ]
                    },

“红色”

{
                      "value" : 1.0375981,
                      "description" : "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:",
                      "details" : [
                        {
                          "value" : 53,
                          "description" : "n, number of documents containing term",
                          "details" : [ ]
                        },
                        {
                          "value" : 150,
                          "description" : "N, total number of documents with field",
                          "details" : [ ]
                        }
                      ]
                    },

每个（红色和黄色）在每个文档中只出现一次。如果有红色或黄色，我想得到相同的分数。我不在乎每个人有多少文件。如果一份文件只有黄色而另一份文件只有红色，我希望两者的分数相同。可能吗？

Answer 3

像其他人提到的那样 - 分数取决于许多因素。但是，如果您想忽略所有这些，您可以使用 constant_score 在文档匹配特定术语时分配一致的分数，例如：

{
  "query": {
    "bool": {
      "should": [
        {
          "constant_score": {
            "filter": {
              "term": {
                "color": "Yellow"
              }
            },
            "boost": 1
          }
        },
        {
          "constant_score": {
            "filter": {
              "term": {
                "color": "Red"
              }
            },
            "boost": 1
          }
        },
        {
          "constant_score": {
            "filter": {
              "term": {
                "color": "Blue"
              }
            },
            "boost": 1
          }
        }
      ],
      "minimum_should_match": 1
    }
  }
}

我相信这应该能满足您的要求。

Elasticsearch 应该有不同的分数

Elasticsearch should has different scores

elasticsearch

elasticsearch-query

term-query