按 'prefix first' 逻辑对弹性命中进行排序
Sorting elastic hits on 'prefix first' logic
我想实现一个排序的结果集,其中在自动建议中开始搜索词的词出现在顶部,然后是 'contain' 它在文本中的词:例如:
搜索词:倡导者
结果:
提倡 x
提倡Yx
一些拥护者
我的结果集 howvere 为包含该术语的结果给出了比 'begin with' it.How 更高的分数,我要不要修正这个:
映射,js:
{
"settings": {
"index": {
"max_ngram_diff": 39
},
"analysis": {
"normalizer": {
"custom_normalizer": {
"type": "custom",
"char_filter": [],
"filter": [
"lowercase",
"asciifolding"
]
}
},
"analyzer": {
"custom_analyzer": {
"tokenizer": "custom_tokenizer",
"filter": [
"lowercase"
]
},
"autocomplete_search": {
"type": "custom",
"tokenizer": "keyword",
"filter": "lowercase"
}
},
"tokenizer": {
"custom_tokenizer": {
"type": "ngram",
"min_gram": 1,
"max_gram": 40,
"token_chars": [
"letter",
"digit",
"whitespace",
"punctuation",
"symbol"
]
}
}
}
},
"mappings": {
"relations": {
"properties": {
"primaryTerm": {
"type": "text",
"analyzer": "custom_analyzer",
"search_analyzer": "autocomplete_search",
"fielddata": "true",
"fields": {
"raw": {
"type": "keyword",
"normalizer": "custom_normalizer"
}
}
},
"entityType": {
"type": "keyword",
"normalizer": "custom_normalizer"
},
"variants": {
"type": "text",
"analyzer": "custom_analyzer",
"search_analyzer": "autocomplete_search",
"fielddata": "true",
"fields": {
"raw": {
"type": "keyword",
"normalizer": "custom_normalizer"
}
}
}
}
}
}
}
搜索查询:
String query="{"bool": { "should": [ {"query_string": {"query":"advocate","fields": ["primaryTerm" ]}},{"query_string": {"query":"advocate","fields": ["primaryTerm.raw^2" ] } } ]}}";
结果:
其他:
弹性结果:
{"total":1,"successful":1,"skipped":0,"failed":0},"hits":{"total":12,"max_score":6.094379,"hits":[{"_index":"agencyvars","_type":"relations","_id":"qCeqHHgBcFeeTWhjAoua","_score":6.094379,"_source":{"entityType":"Agency","primaryTerm":"ACT ADVOCATES","variants":[]}},{"_index":"agencyvars","_type":"relations","_id":"OyeqHHgBcFeeTWhjJYxu","_score":5.6339674,"_source":{"entityType":"Agency","primaryTerm":"TALWAR ADVOCATES","variants":["TALWAR & ADVOCATES"]}},{"_index":"agencyvars","_type":"relations","_id":"BSeqHHgBcFeeTWhjGIyJ","_score":5.1183944,"_source":{"entityType":"Agency","primaryTerm":"ZEUSIP ADVOCATES LLP","variants":["ZEUS IP, ADVOCATES","ZEUSIP ADVOCATES","ZEUS IP ADVOCATES","ZEUS IP","ZEUSIPADVOCATES LLP","ZIUSIP ADVOCATES"]}},{"_index":"agencyvars","_type":"relations","_id":"3CeqHHgBcFeeTWhjTYyZ","_score":4.6892724,"_source":{"entityType":"Agency","primaryTerm":"MURTI & MURTI ADVOCATES","variants":[]}},{"_index":"agencyvars","_type":"relations","_id":"0SeqHHgBcFeeTWhjjI18","_score":4.4118576,"_source":{"entityType":"Agency","primaryTerm":"ANAND AND ANAND ADVOCATES","variants":["AANAND & ANAND ADVOCATES","NAND AND ANAND ADVOCATES","ANAND & ANAND, ADVOCATES","ANAND & ANAND ADVOCATES","ANAND & ANAND,ADVOCATES","ANAND & ANAND","ANAND&ANAND","ANAND AND ANAND ADVOCAETES","ANAND AND ANAND ADVOCATE","ANAND AND ANANDADVOCATES","AND ANAND ADVOCATES","ANAND & ANAND ADVOCATES.","ANAND AND ANAN","ANAND AND ANAND","ANAND AND ANAND ADVOCATES,","ANAND AND ANAND ADVOCATES.","ANAND AND ANAND , ADVOCATES","ANAND AND"]}},{"_index":"agencyvars","_type":"relations","_id":"2CeqHHgBcFeeTWhjTIyn","_score":3.2560868,"_source":{"entityType":"Agency","primaryTerm":"STAR IP Advocates and IPR Attorneys","variants":["STARIP, ADVOCATES & IP ATTORNEYS"]}},{"_index":"agencyvars","_type":"relations","_id":"3yeqHHgBcFeeTWhjD4uW","_score":2.521993,"_source":{"entityType":"Agency","primaryTerm":"ADVOCATE AND PATENTS & TRADE MARKS ATTORNEY","variants":[]}}]}}#######3
总之分数是:
score":5.6339674,"_source":{"primaryTerm":"TALWAR ADVOCATES"}
_score":5.1183944,"_source":{"primaryTerm":"INTELLEXIP ADVOCATES}
score":2.521993,"_source":{"primaryTerm":"ADVOCATE AND PATENTS & TRADE MARKS ATTORNEY}
PS:由于我是 elastic[=16= 的新手,所以对答案的一个小解释将不胜感激]
要应用前缀优先逻辑,您可以使用 prefix query 和 boost
参数。试试下面的查询
{
"query": {
"bool": {
"should": [
{
"query_string": {
"query": "advocate",
"fields": [
"primaryTerm"
]
}
},
{
"prefix": {
"primaryTerm.raw": {
"value": "advocate",
"boost": 2
}
}
}
]
}
}
}
搜索结果将是
"hits": [
{
"_index": "67049029",
"_type": "_doc",
"_id": "1",
"_score": 2.0386105,
"_source": {
"primaryTerm": "ADVOCATE AND PATENTS & TRADE MARKS ATTORNEY"
}
},
{
"_index": "67049029",
"_type": "_doc",
"_id": "3",
"_score": 0.08597656,
"_source": {
"primaryTerm": "TALWAR ADVOCATES"
}
},
{
"_index": "67049029",
"_type": "_doc",
"_id": "2",
"_score": 0.07815027,
"_source": {
"primaryTerm": "INTELLEXIP ADVOCATES"
}
}
]
更新 1:
boost
2 在您的案例中不起作用,因为 TALWAR ADVOCATES"
的分数是 5.6339674,"ADVOCATE AND PATENTS & TRADE MARKS ATTORNEY"
的分数是 2.521993。
2.521993
乘以 2,得到 5.043986
。由于 5.043986 < 5.6339674,您没有得到预期的搜索结果。因此,boost 10 对你有用。但是,任何大于 2 的提升值都适用。
我想实现一个排序的结果集,其中在自动建议中开始搜索词的词出现在顶部,然后是 'contain' 它在文本中的词:例如: 搜索词:倡导者 结果:
提倡 x
提倡Yx
一些拥护者
我的结果集 howvere 为包含该术语的结果给出了比 'begin with' it.How 更高的分数,我要不要修正这个:
映射,js:
{
"settings": {
"index": {
"max_ngram_diff": 39
},
"analysis": {
"normalizer": {
"custom_normalizer": {
"type": "custom",
"char_filter": [],
"filter": [
"lowercase",
"asciifolding"
]
}
},
"analyzer": {
"custom_analyzer": {
"tokenizer": "custom_tokenizer",
"filter": [
"lowercase"
]
},
"autocomplete_search": {
"type": "custom",
"tokenizer": "keyword",
"filter": "lowercase"
}
},
"tokenizer": {
"custom_tokenizer": {
"type": "ngram",
"min_gram": 1,
"max_gram": 40,
"token_chars": [
"letter",
"digit",
"whitespace",
"punctuation",
"symbol"
]
}
}
}
},
"mappings": {
"relations": {
"properties": {
"primaryTerm": {
"type": "text",
"analyzer": "custom_analyzer",
"search_analyzer": "autocomplete_search",
"fielddata": "true",
"fields": {
"raw": {
"type": "keyword",
"normalizer": "custom_normalizer"
}
}
},
"entityType": {
"type": "keyword",
"normalizer": "custom_normalizer"
},
"variants": {
"type": "text",
"analyzer": "custom_analyzer",
"search_analyzer": "autocomplete_search",
"fielddata": "true",
"fields": {
"raw": {
"type": "keyword",
"normalizer": "custom_normalizer"
}
}
}
}
}
}
}
搜索查询:
String query="{"bool": { "should": [ {"query_string": {"query":"advocate","fields": ["primaryTerm" ]}},{"query_string": {"query":"advocate","fields": ["primaryTerm.raw^2" ] } } ]}}";
弹性结果:
{"total":1,"successful":1,"skipped":0,"failed":0},"hits":{"total":12,"max_score":6.094379,"hits":[{"_index":"agencyvars","_type":"relations","_id":"qCeqHHgBcFeeTWhjAoua","_score":6.094379,"_source":{"entityType":"Agency","primaryTerm":"ACT ADVOCATES","variants":[]}},{"_index":"agencyvars","_type":"relations","_id":"OyeqHHgBcFeeTWhjJYxu","_score":5.6339674,"_source":{"entityType":"Agency","primaryTerm":"TALWAR ADVOCATES","variants":["TALWAR & ADVOCATES"]}},{"_index":"agencyvars","_type":"relations","_id":"BSeqHHgBcFeeTWhjGIyJ","_score":5.1183944,"_source":{"entityType":"Agency","primaryTerm":"ZEUSIP ADVOCATES LLP","variants":["ZEUS IP, ADVOCATES","ZEUSIP ADVOCATES","ZEUS IP ADVOCATES","ZEUS IP","ZEUSIPADVOCATES LLP","ZIUSIP ADVOCATES"]}},{"_index":"agencyvars","_type":"relations","_id":"3CeqHHgBcFeeTWhjTYyZ","_score":4.6892724,"_source":{"entityType":"Agency","primaryTerm":"MURTI & MURTI ADVOCATES","variants":[]}},{"_index":"agencyvars","_type":"relations","_id":"0SeqHHgBcFeeTWhjjI18","_score":4.4118576,"_source":{"entityType":"Agency","primaryTerm":"ANAND AND ANAND ADVOCATES","variants":["AANAND & ANAND ADVOCATES","NAND AND ANAND ADVOCATES","ANAND & ANAND, ADVOCATES","ANAND & ANAND ADVOCATES","ANAND & ANAND,ADVOCATES","ANAND & ANAND","ANAND&ANAND","ANAND AND ANAND ADVOCAETES","ANAND AND ANAND ADVOCATE","ANAND AND ANANDADVOCATES","AND ANAND ADVOCATES","ANAND & ANAND ADVOCATES.","ANAND AND ANAN","ANAND AND ANAND","ANAND AND ANAND ADVOCATES,","ANAND AND ANAND ADVOCATES.","ANAND AND ANAND , ADVOCATES","ANAND AND"]}},{"_index":"agencyvars","_type":"relations","_id":"2CeqHHgBcFeeTWhjTIyn","_score":3.2560868,"_source":{"entityType":"Agency","primaryTerm":"STAR IP Advocates and IPR Attorneys","variants":["STARIP, ADVOCATES & IP ATTORNEYS"]}},{"_index":"agencyvars","_type":"relations","_id":"3yeqHHgBcFeeTWhjD4uW","_score":2.521993,"_source":{"entityType":"Agency","primaryTerm":"ADVOCATE AND PATENTS & TRADE MARKS ATTORNEY","variants":[]}}]}}#######3
总之分数是:
score":5.6339674,"_source":{"primaryTerm":"TALWAR ADVOCATES"}
_score":5.1183944,"_source":{"primaryTerm":"INTELLEXIP ADVOCATES}
score":2.521993,"_source":{"primaryTerm":"ADVOCATE AND PATENTS & TRADE MARKS ATTORNEY}
PS:由于我是 elastic[=16= 的新手,所以对答案的一个小解释将不胜感激]
要应用前缀优先逻辑,您可以使用 prefix query 和 boost
参数。试试下面的查询
{
"query": {
"bool": {
"should": [
{
"query_string": {
"query": "advocate",
"fields": [
"primaryTerm"
]
}
},
{
"prefix": {
"primaryTerm.raw": {
"value": "advocate",
"boost": 2
}
}
}
]
}
}
}
搜索结果将是
"hits": [
{
"_index": "67049029",
"_type": "_doc",
"_id": "1",
"_score": 2.0386105,
"_source": {
"primaryTerm": "ADVOCATE AND PATENTS & TRADE MARKS ATTORNEY"
}
},
{
"_index": "67049029",
"_type": "_doc",
"_id": "3",
"_score": 0.08597656,
"_source": {
"primaryTerm": "TALWAR ADVOCATES"
}
},
{
"_index": "67049029",
"_type": "_doc",
"_id": "2",
"_score": 0.07815027,
"_source": {
"primaryTerm": "INTELLEXIP ADVOCATES"
}
}
]
更新 1:
boost
2 在您的案例中不起作用,因为 TALWAR ADVOCATES"
的分数是 5.6339674,"ADVOCATE AND PATENTS & TRADE MARKS ATTORNEY"
的分数是 2.521993。
2.521993
乘以 2,得到 5.043986
。由于 5.043986 < 5.6339674,您没有得到预期的搜索结果。因此,boost 10 对你有用。但是,任何大于 2 的提升值都适用。