Elastic Search 对多个字段进行模糊匹配,并对多个字段组合得分进行排序

Elastic Search on multiple fields with fuzziness matches and sort on multiple fields combined scores

我在 Laravel 中使用 Elastic Search,我的索引有 3 个字段 text,mood,haloha_id。 首先,我想匹配 "haloha_id"(将 haloha_id 视为 post,将文本视为对 post 的评论)如果匹配,则进行进一步匹配。 假设现在匹配 "haloha_id" 我想匹配 "text" 字段中的一个子字符串然后匹配 "mood"(它是整数 0,1,2 等)“只有在一些情况下才应该匹配心情"text" 的匹配,否则 not.I'm making Like Mine query 意味着与用户对特定 post 的评论匹配的评论将仅显示。我查询中的问题是

这是我的查询。

 "query"=>[      

    "bool"=>[                                
        "should"=>[
            "match"=>[
                "text"=>[
                    "query"=>$userHaloha->filtered_text,
                    "fuzziness"=>"AUTO",                
                ]
            ]                           
        ],
        "minimum_should_match"=>1,
        "must"=>[
            "match"=>[
                "mood"=>$userHaloha->mood,            
            ],
            "match"=>[
                "haloha_id"=>$userHaloha->haloha_id
            ]
        ] 

查询是不言自明的。我添加了 "haloha_id" 过滤块(不对文档评分),"text" 必须阻止(对文档评分)和 "mood" 应该阻止(提升文档)

{
  "query": {
    "bool": {
      "filter": [
        {
          "match": {
            "haloha_id": "5ecf6bff25a36366cd134db2"
          }
        }
      ],
      "must": [
        {
          "match": {
            "text": {
              "query": "chilli ",
              "fuzziness": "auto"
            }
          }
        }
      ],
      "should": [
        {
          "term": {
            "mood": {
              "value": 2
            }
          }
        }
      ]
    }
  }
}

mood:3 中的问题比 mood:2(在 should 子句中搜索的词)排名更高是由于分片

来自docs

If you notice that two documents with the same content get different scores or that an exact match is not ranked first, then the issue might be related to sharding. By default, Elasticsearch makes each shard responsible for producing its own scores. However since index statistics are an important contributor to the scores, this only works well if shards have similar index statistics. The assumption is that since documents are routed evenly to shards by default, then index statistics should be very similar and scoring would work as expected. However in the event that you either:

use routing at index time, query multiple indices, or have too little data in your index then there are good chances that all shards that are involved in the search request do not have similar index statistics and relevancy could be bad.

If you have a small dataset, the easiest way to work around this issue is to index everything into an index that has a single shard (index.number_of_shards: 1), which is the default. Then index statistics will be the same for all documents and scores will be consistent.