根据 Elasticsearch 中的字段值对记录进行排名
Rank records on the basis of a field value in Elasticsearch
我在记录架构中有一个字段 distribution
,如下所示:
...
"distribution": {
"properties": {
"availability": {
"type": "keyword"
}
}
}
...
我想 distribution.availability == "ondemand"
的记录排名低于其他记录。
我查看了 Elasticsearch docs,但找不到一种方法来降低此类记录在 index-time 中的得分,使其在搜索结果中显示得较低.
我怎样才能做到这一点,任何指向相关源的指针也足够了。
更多信息:
在 python 客户端的帮助下,我在 查询时间 中完全省略了这些 ondemand
记录,如下所示:
from elasticsearch_dsl.query import Q
_query = Q("query_string", query=query_string) & ~Q('match', **{'availability.keyword': 'ondemand'})
现在,我想包括这些记录,但我想将它们放在低于其他记录的位置。
如果无法在 index-time 中实现类似的功能,请建议我如何在 query-time[=39= 中实现它] 与 python 个客户。
应用 llermaly 的建议后,python 客户端查询如下所示:
boosting_query = Q(
"boosting",
positive=Q("match_all"),
negative=Q(
"bool", filter=[Q({"term": {"distribution.availability.keyword": "ondemand"}})]
),
negative_boost=0.5,
)
if query_string:
_query = Q("query_string", query=query_string) & boosting_query
else:
_query = Q() & boosting_query
EDIT2 : elasticsearch-dsl-py 提升查询版本
from elasticsearch import Elasticsearch
from elasticsearch_dsl import Search
from elasticsearch_dsl import Q
client = Elasticsearch()
q = Q('boosting', positive=Q("match_all"), negative=Q('bool', filter=[Q({"term": {"test.available.keyword": "ondemand"}})]), negative_boost=0.5)
s = Search(using=client, index="test_parths007").query(q)
response = s.execute()
print(response)
for hit in response:
print(hit.meta.score, hit.test.available)
编辑 :刚读完你需要在索引时间做。
Elasticsearch 在 5.0 上弃用了索引时间提升
https://www.elastic.co/guide/en/elasticsearch/reference/7.11/mapping-boost.html
您可以使用 Boosting query 在查询时实现。
摄取文档
POST test_parths007/_doc
{
"name": "doc1",
"test": {
"available": "ondemand"
}
}
POST test_parths007/_doc
{
"name": "doc1",
"test": {
"available": "higherscore"
}
}
POST test_parths007/_doc
{
"name": "doc2",
"test": {
"available": "higherscore"
}
}
查询(索引时间)
POST test_parths007/_search
{
"query": {
"boosting": {
"positive": {
"match_all": {}
},
"negative": {
"term": {
"test.available.keyword": "ondemand"
}
},
"negative_boost": 0.5
}
}
}
回应
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 3,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "test_parths007",
"_type" : "_doc",
"_id" : "VMdY7XcB50NMsuQPelRx",
"_score" : 1.0,
"_source" : {
"name" : "doc2",
"test" : {
"available" : "higherscore"
}
}
},
{
"_index" : "test_parths007",
"_type" : "_doc",
"_id" : "Vcda7XcB50NMsuQPiVRB",
"_score" : 1.0,
"_source" : {
"name" : "doc1",
"test" : {
"available" : "higherscore"
}
}
},
{
"_index" : "test_parths007",
"_type" : "_doc",
"_id" : "U8dY7XcB50NMsuQPdlTo",
"_score" : 0.5,
"_source" : {
"name" : "doc1",
"test" : {
"available" : "ondemand"
}
}
}
]
}
}
对于更高级的操作,您可以查看 Function Score Query
我在记录架构中有一个字段 distribution
,如下所示:
...
"distribution": {
"properties": {
"availability": {
"type": "keyword"
}
}
}
...
我想 distribution.availability == "ondemand"
的记录排名低于其他记录。
我查看了 Elasticsearch docs,但找不到一种方法来降低此类记录在 index-time 中的得分,使其在搜索结果中显示得较低.
我怎样才能做到这一点,任何指向相关源的指针也足够了。
更多信息:
在 python 客户端的帮助下,我在 查询时间 中完全省略了这些 ondemand
记录,如下所示:
from elasticsearch_dsl.query import Q
_query = Q("query_string", query=query_string) & ~Q('match', **{'availability.keyword': 'ondemand'})
现在,我想包括这些记录,但我想将它们放在低于其他记录的位置。
如果无法在 index-time 中实现类似的功能,请建议我如何在 query-time[=39= 中实现它] 与 python 个客户。
应用 llermaly 的建议后,python 客户端查询如下所示:
boosting_query = Q(
"boosting",
positive=Q("match_all"),
negative=Q(
"bool", filter=[Q({"term": {"distribution.availability.keyword": "ondemand"}})]
),
negative_boost=0.5,
)
if query_string:
_query = Q("query_string", query=query_string) & boosting_query
else:
_query = Q() & boosting_query
EDIT2 : elasticsearch-dsl-py 提升查询版本
from elasticsearch import Elasticsearch
from elasticsearch_dsl import Search
from elasticsearch_dsl import Q
client = Elasticsearch()
q = Q('boosting', positive=Q("match_all"), negative=Q('bool', filter=[Q({"term": {"test.available.keyword": "ondemand"}})]), negative_boost=0.5)
s = Search(using=client, index="test_parths007").query(q)
response = s.execute()
print(response)
for hit in response:
print(hit.meta.score, hit.test.available)
编辑 :刚读完你需要在索引时间做。
Elasticsearch 在 5.0 上弃用了索引时间提升 https://www.elastic.co/guide/en/elasticsearch/reference/7.11/mapping-boost.html
您可以使用 Boosting query 在查询时实现。
摄取文档
POST test_parths007/_doc
{
"name": "doc1",
"test": {
"available": "ondemand"
}
}
POST test_parths007/_doc
{
"name": "doc1",
"test": {
"available": "higherscore"
}
}
POST test_parths007/_doc
{
"name": "doc2",
"test": {
"available": "higherscore"
}
}
查询(索引时间)
POST test_parths007/_search
{
"query": {
"boosting": {
"positive": {
"match_all": {}
},
"negative": {
"term": {
"test.available.keyword": "ondemand"
}
},
"negative_boost": 0.5
}
}
}
回应
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 3,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "test_parths007",
"_type" : "_doc",
"_id" : "VMdY7XcB50NMsuQPelRx",
"_score" : 1.0,
"_source" : {
"name" : "doc2",
"test" : {
"available" : "higherscore"
}
}
},
{
"_index" : "test_parths007",
"_type" : "_doc",
"_id" : "Vcda7XcB50NMsuQPiVRB",
"_score" : 1.0,
"_source" : {
"name" : "doc1",
"test" : {
"available" : "higherscore"
}
}
},
{
"_index" : "test_parths007",
"_type" : "_doc",
"_id" : "U8dY7XcB50NMsuQPdlTo",
"_score" : 0.5,
"_source" : {
"name" : "doc1",
"test" : {
"available" : "ondemand"
}
}
}
]
}
}
对于更高级的操作,您可以查看 Function Score Query