调整搜索结果中的排名

Tuning ranking in search result

doc1 = {"id":1,"text":"tonight"}
doc2 = {"id":2,"text":"tonight tonight"}
doc3 = {"id":3,"text":"tonight tonight tonight"}
doc4 = {"id":4,"text":"tonight and something else"}
doc5 = {"id":5,"text":"tonight and you"}

es.index(index="tonight", document=doc1)
es.index(index="tonight", document=doc2)
es.index(index="tonight", document=doc3)
es.index(index="tonight", document=doc4)
es.index(index="tonight", document=doc5)

假设我已将上述文档编入索引。当我使用以下查询时:

data = json.dumps({
    "query":{ 
        "bool":{
            "should":[
                {
                    "match":{
                        "text": "tonight"
                    }
                }
            ]
        }
    }
})

点击返回的顺序为"tonight tonight tonight", "tonight tonight","tonight", "tonight and you" and "tonight and something else"

请问有没有办法让"tonight"作为第一个返回的_score最高?

在我的实际用例中,我正在遍历整个索引以找出除自身以外最相关的文本,如果可能,应将正在搜索的文档作为第一个命中(最匹配)返回。

有人可以给我一些关于如何查询的想法吗?

谢谢!!!

如果您使用的是 Elasticsearch 默认索引映射,那么您可以使用 term query on the text.keyword field. Otherwise, you can add multi fields to the text field using Update Mapping API.

术语查询用于 return 与搜索词完全匹配的文档。

您可以在 bool should 子句中包含 term 查询,与其他文档的得分相比,这将增加完全匹配文档的得分。

{
    "query": {
        "bool": {
            "should": [
                {
                    "match": {
                        "text": "tonight"
                    }
                },
                {
                    "term": {
                        "text.keyword": "tonight"
                    }
                }
            ]
        }
    }
}

搜索结果将是

{
    "took": 3,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 5,
            "relation": "eq"
        },
        "max_score": 1.5025805,
        "hits": [
            {
                "_index": "stof",
                "_id": "1",
                "_score": 1.5025805,
                "_source": {
                    "id": 1,
                    "text": "tonight"
                }
            },
            {
                "_index": "stof",
                "_id": "3",
                "_score": 0.13236837,
                "_source": {
                    "id": 3,
                    "text": "tonight tonight tonight"
                }
            },
            {
                "_index": "stof",
                "_id": "2",
                "_score": 0.12794474,
                "_source": {
                    "id": 2,
                    "text": "tonight tonight"
                }
            },
            {
                "_index": "stof",
                "_id": "5",
                "_score": 0.08185939,
                "_source": {
                    "id": 5,
                    "text": "tonight and you"
                }
            },
            {
                "_index": "stof",
                "_id": "4",
                "_score": 0.07130444,
                "_source": {
                    "id": 4,
                    "text": "tonight and something else"
                }
            }
        ]
    }
}