Elasticsearch 模糊查询忽略提升因子?

Elasticsearch fuzzy queries ignores boost factor?

当我运行这个查询时:

GET /index_for_test/_search
{
    "query": {
        "multi_match": {
            "query":       "Italian",
            "type":        "most_fields",
            "fields":      [ "name^2", "categories" ],
        }
    }
}

它显示了这个结果:

{
   "took": 2,
   "timed_out": false,
   "_shards": {
      "total": 5,
      "successful": 5,
      "failed": 0
   },
   "hits": {
      "total": 2,
      "max_score": 0.04012554,
      "hits": [
         {
            "_index": "index_for_test",
            "_type": "business",
            "_id": "1269493995",
            "_score": 0.04012554,
            "_source": {
               "name": "Bono Italian Restaurant",
               "categories": [
                  "Pizza"
               ]
            }
         },
         {
            "_index": "index_for_test",
            "_type": "business",
            "_id": "2017788160",
            "_score": 0.014542127,
            "_source": {
               "name": "Pizza Perperook",
               "categories": [
                  "Italian Food"
               ]
            }
         }
      ]
   }
}

但是当我为这个查询添加模糊性时:

GET /index_for_test/_search
{
    "query": {
        "multi_match": {
            "query":       "Italian",
            "type":        "most_fields",
            "fields":      [ "name^2", "categories" ],
            "fuzziness":2
        }
    }
}

它将忽略提升因子并显示此结果:

{
   "took": 28,
   "timed_out": false,
   "_shards": {
      "total": 5,
      "successful": 5,
      "failed": 0
   },
   "hits": {
      "total": 2,
      "max_score": 0.095891505,
      "hits": [
         {
            "_index": "index_for_test",
            "_type": "business",
            "_id": "2017788160",
            "_score": 0.095891505,
            "_source": {
               "name": "Pizza Perperook",
               "categories": [
                  "Italian Food"
               ]
            }
         },
         {
            "_index": "index_for_test",
            "_type": "business",
            "_id": "1269493995",
            "_score": 0.076713204,
            "_source": {
               "name": "Bono Italian Restaurant",
               "categories": [
                  "Pizza"
               ]
            }
         }
      ]
   }
}

当我两次提升 name 字段(通过使用 name^2 作为字段)时,它应该显示与第一个查询相同的结果,但它似乎忽略了提升因子。

我使用其他类型的查询(query_string、fuzzy_like_this)并遇到了同样的问题。

已编辑:

GET /index_for_test/_search?explain=true
{
    "query": {
        "multi_match": {
            "query":       "پیتزا",
            "type":        "most_fields",
            "fields":      [ "name^2", "categories" ]
        }
    }
}

使用 ?explain=true 进行模糊搜索的结果:

{
   "took": 25,
   "timed_out": false,
   "_shards": {
      "total": 5,
      "successful": 5,
      "failed": 0
   },
   "hits": {
      "total": 3,
      "max_score": 0.05015693,
      "hits": [
         {
            "_shard": 1,
            "_node": "ZTZ37EpAR1W9e4Qqwk0O5Q",
            "_index": "index_for_test",
            "_type": "business",
            "_id": "2017788160",
            "_score": 0.05015693,
            "_source": {
               "name": "پیتزا پرپروک",
               "categories": [
                  "غذای ایتالیایی"
               ]
            },
            "_explanation": {
               "value": 0.05015693,
               "description": "product of:",
               "details": [
                  {
                     "value": 0.10031386,
                     "description": "sum of:",
                     "details": [
                        {
                           "value": 0.10031386,
                           "description": "weight(name:پیتزا^2.0 in 0) [PerFieldSimilarity], result of:",
                           "details": [
                              {
                                 "value": 0.10031386,
                                 "description": "score(doc=0,freq=1.0 = termFreq=1.0\n), product of:",
                                 "details": [
                                    {
                                       "value": 0.5230591,
                                       "description": "queryWeight, product of:",
                                       "details": [
                                          {
                                             "value": 2,
                                             "description": "boost"
                                          },
                                          {
                                             "value": 0.30685282,
                                             "description": "idf(docFreq=1, maxDocs=1)"
                                          },
                                          {
                                             "value": 0.8522964,
                                             "description": "queryNorm"
                                          }
                                       ]
                                    },
                                    {
                                       "value": 0.19178301,
                                       "description": "fieldWeight in 0, product of:",
                                       "details": [
                                          {
                                             "value": 1,
                                             "description": "tf(freq=1.0), with freq of:",
                                             "details": [
                                                {
                                                   "value": 1,
                                                   "description": "termFreq=1.0"
                                                }
                                             ]
                                          },
                                          {
                                             "value": 0.30685282,
                                             "description": "idf(docFreq=1, maxDocs=1)"
                                          },
                                          {
                                             "value": 0.625,
                                             "description": "fieldNorm(doc=0)"
                                          }
                                       ]
                                    }
                                 ]
                              }
                           ]
                        }
                     ]
                  },
                  {
                     "value": 0.5,
                     "description": "coord(1/2)"
                  }
               ]
            }
         },
         {
            "_shard": 2,
            "_node": "ZTZ37EpAR1W9e4Qqwk0O5Q",
            "_index": "index_for_test",
            "_type": "business",
            "_id": "1269493995",
            "_score": 0.023267403,
            "_source": {
               "name": "رستوران ایتالیایی بونو",
               "categories": [
                  "پیتزا"
               ]
            },
            "_explanation": {
               "value": 0.023267403,
               "description": "product of:",
               "details": [
                  {
                     "value": 0.046534806,
                     "description": "sum of:",
                     "details": [
                        {
                           "value": 0.046534806,
                           "description": "weight(categories:پیتزا in 0) [PerFieldSimilarity], result of:",
                           "details": [
                              {
                                 "value": 0.046534806,
                                 "description": "score(doc=0,freq=1.0 = termFreq=1.0\n), product of:",
                                 "details": [
                                    {
                                       "value": 0.15165187,
                                       "description": "queryWeight, product of:",
                                       "details": [
                                          {
                                             "value": 0.30685282,
                                             "description": "idf(docFreq=1, maxDocs=1)"
                                          },
                                          {
                                             "value": 0.49421698,
                                             "description": "queryNorm"
                                          }
                                       ]
                                    },
                                    {
                                       "value": 0.30685282,
                                       "description": "fieldWeight in 0, product of:",
                                       "details": [
                                          {
                                             "value": 1,
                                             "description": "tf(freq=1.0), with freq of:",
                                             "details": [
                                                {
                                                   "value": 1,
                                                   "description": "termFreq=1.0"
                                                }
                                             ]
                                          },
                                          {
                                             "value": 0.30685282,
                                             "description": "idf(docFreq=1, maxDocs=1)"
                                          },
                                          {
                                             "value": 1,
                                             "description": "fieldNorm(doc=0)"
                                          }
                                       ]
                                    }
                                 ]
                              }
                           ]
                        }
                     ]
                  },
                  {
                     "value": 0.5,
                     "description": "coord(1/2)"
                  }
               ]
            }
         },
         {
            "_shard": 3,
            "_node": "ZTZ37EpAR1W9e4Qqwk0O5Q",
            "_index": "index_for_test",
            "_type": "business",
            "_id": "1203656733",
            "_score": 0.023267403,
            "_source": {
               "name": "چمن",
               "categories": [
                  "پیتزا"
               ]
            },
            "_explanation": {
               "value": 0.023267403,
               "description": "product of:",
               "details": [
                  {
                     "value": 0.046534806,
                     "description": "sum of:",
                     "details": [
                        {
                           "value": 0.046534806,
                           "description": "weight(categories:پیتزا in 0) [PerFieldSimilarity], result of:",
                           "details": [
                              {
                                 "value": 0.046534806,
                                 "description": "score(doc=0,freq=1.0 = termFreq=1.0\n), product of:",
                                 "details": [
                                    {
                                       "value": 0.15165187,
                                       "description": "queryWeight, product of:",
                                       "details": [
                                          {
                                             "value": 0.30685282,
                                             "description": "idf(docFreq=1, maxDocs=1)"
                                          },
                                          {
                                             "value": 0.49421698,
                                             "description": "queryNorm"
                                          }
                                       ]
                                    },
                                    {
                                       "value": 0.30685282,
                                       "description": "fieldWeight in 0, product of:",
                                       "details": [
                                          {
                                             "value": 1,
                                             "description": "tf(freq=1.0), with freq of:",
                                             "details": [
                                                {
                                                   "value": 1,
                                                   "description": "termFreq=1.0"
                                                }
                                             ]
                                          },
                                          {
                                             "value": 0.30685282,
                                             "description": "idf(docFreq=1, maxDocs=1)"
                                          },
                                          {
                                             "value": 1,
                                             "description": "fieldNorm(doc=0)"
                                          }
                                       ]
                                    }
                                 ]
                              }
                           ]
                        }
                     ]
                  },
                  {
                     "value": 0.5,
                     "description": "coord(1/2)"
                  }
               ]
            }
         }
      ]
   }
}

Boost 并未被忽略...您只是在分数中添加了一个模糊成分,这正在改变整体排序。如果您 运行 使用 ?explain=true 进行查询,您将获得有关分数构建方式的调试转储。

对于您的第一个查询,需要完全匹配。结合most_fields,打分就比较简单了:找字段数最多的准确匹配的文档。

您的第二个查询通过两次编辑引入了模糊性。这意味着两个字符编辑中的任何单词都将匹配。这可以大大改变匹配标记的数量。

如果你 post explain 调试输出,我可以帮助分析它给你一个更清晰的解释,但基本上答案是:boosting 仍然有效,你的分数只是因为模糊匹配。

根据 Zach 的建议,我将查询更改为此以实现我的结果:

GET /index_for_test/_search
{
    "query": {
      "bool": {
        "should": [
          {
            "multi_match": {
            "query":       "Italian",
            "type":        "most_fields",
            "fields":      [ "name^2", "categories" ],
            "boost":10
          }
          },
          {
            "multi_match": {
            "query":       "Italian",
            "type":        "most_fields",
            "fields":      [ "name^2", "categories" ],
            "fuzziness":2
          }
          }
        ]
      }
    }
}