如何在 Elasticsearch 中为精确搜索提供比语音搜索更高的分数?
How to give higher score to exact searches than phonetic ones in Elasticsearch?
我目前正在使用 Elasticsearch 的语音分析器。我希望查询为完全匹配项提供比语音匹配项更高的分数。这是我正在使用的查询:
{
"query": {
"multi_match" : {
"query" : "Abhijeet",
"fields" : ["content", "title"]
}
},
"size": 10,
"_source": [ "title", "bench", "court", "id_" ],
"highlight": {
"fields" : {
"title" : {},
"content":{}
}
}
}
当我搜索 Abhijeet
时,最热门的查询是 Abhijit
,然后才出现 Abhijeet
。我希望始终首先出现完全匹配项,然后才是拼音匹配项。这能做到吗?
编辑:
映射
{
"courts_2": {
"mappings": {
"properties": {
"author": {
"type": "text",
"analyzer": "my_analyzer"
},
"bench": {
"type": "text",
"analyzer": "my_analyzer"
},
"citation": {
"type": "text"
},
"content": {
"type": "text",
"analyzer": "my_analyzer"
},
"court": {
"type": "text"
},
"date": {
"type": "text"
},
"id_": {
"type": "text"
},
"title": {
"type": "text",
"analyzer": "my_analyzer"
},
"verdict": {
"type": "text"
}
}
}
}
}
这是我用来设置语音分析器的代码:
{
"settings": {
"index": {
"analysis": {
"analyzer": {
"my_analyzer": {
"tokenizer": "standard",
"filter": [
"lowercase",
"my_metaphone"
]
}
},
"filter": {
"my_metaphone": {
"type": "phonetic",
"encoder": "metaphone",
"replace": true
}
}
}
}
},
"mappings": {
"properties": {
"author": {
"type": "text",
"analyzer": "my_analyzer"
},
"bench": {
"type": "text",
"analyzer": "my_analyzer"
},
"citation": {
"type": "text"
},
"content": {
"type": "text",
"analyzer": "my_analyzer"
},
"court": {
"type": "text"
},
"date": {
"type": "text"
},
"id_": {
"type": "text"
},
"title": {
"type": "text",
"analyzer": "my_analyzer"
},
"verdict": {
"type": "text"
}
}
}
}
现在,我只想查询 title
和 content
字段。在这里,我希望首先出现完全匹配,然后出现拼音匹配。
一般的解决方法是:
- 使用
bool
-查询,
- 在 must 子句中加上你的 ponetic query/queries,
- 和should子句中的非拼音query/queries
如果您在问题中包含索引的映射和设置,我可以更新答案。
更新:解决方法
一个。扩展您的映射以对 title
和 content
:
使用多字段
"title": {
"type": "text",
"analyzer": "my_analyzer",
"fields" : {
"standard" : {
"type" : "text"
}
}
},
...
"content": {
"type": "text",
"analyzer": "my_analyzer"
"fields" : {
"standard" : {
"type" : "text"
}
}
},
乙。获取填充的字段(例如通过重新索引所有内容):
POST courts_2/_update_by_query
C。调整您的查询以利用新引入的字段:
GET courts_2/_search
{
"_source": ["title","bench","court","id_"],
"size": 10,
"query": {
"bool": {
"must": {
"multi_match": {
"query": "Abhijeet",
"fields": ["title", "content"]
}
},
"should": {
"multi_match": {
"query": "Abhijeet",
"fields": ["title.standard", "content.standard"]
}
}
}
},
"highlight": {
"fields": {
"title": {},
"content": {}
}
}
}
我目前正在使用 Elasticsearch 的语音分析器。我希望查询为完全匹配项提供比语音匹配项更高的分数。这是我正在使用的查询:
{
"query": {
"multi_match" : {
"query" : "Abhijeet",
"fields" : ["content", "title"]
}
},
"size": 10,
"_source": [ "title", "bench", "court", "id_" ],
"highlight": {
"fields" : {
"title" : {},
"content":{}
}
}
}
当我搜索 Abhijeet
时,最热门的查询是 Abhijit
,然后才出现 Abhijeet
。我希望始终首先出现完全匹配项,然后才是拼音匹配项。这能做到吗?
编辑:
映射
{
"courts_2": {
"mappings": {
"properties": {
"author": {
"type": "text",
"analyzer": "my_analyzer"
},
"bench": {
"type": "text",
"analyzer": "my_analyzer"
},
"citation": {
"type": "text"
},
"content": {
"type": "text",
"analyzer": "my_analyzer"
},
"court": {
"type": "text"
},
"date": {
"type": "text"
},
"id_": {
"type": "text"
},
"title": {
"type": "text",
"analyzer": "my_analyzer"
},
"verdict": {
"type": "text"
}
}
}
}
}
这是我用来设置语音分析器的代码:
{
"settings": {
"index": {
"analysis": {
"analyzer": {
"my_analyzer": {
"tokenizer": "standard",
"filter": [
"lowercase",
"my_metaphone"
]
}
},
"filter": {
"my_metaphone": {
"type": "phonetic",
"encoder": "metaphone",
"replace": true
}
}
}
}
},
"mappings": {
"properties": {
"author": {
"type": "text",
"analyzer": "my_analyzer"
},
"bench": {
"type": "text",
"analyzer": "my_analyzer"
},
"citation": {
"type": "text"
},
"content": {
"type": "text",
"analyzer": "my_analyzer"
},
"court": {
"type": "text"
},
"date": {
"type": "text"
},
"id_": {
"type": "text"
},
"title": {
"type": "text",
"analyzer": "my_analyzer"
},
"verdict": {
"type": "text"
}
}
}
}
现在,我只想查询 title
和 content
字段。在这里,我希望首先出现完全匹配,然后出现拼音匹配。
一般的解决方法是:
- 使用
bool
-查询, - 在 must 子句中加上你的 ponetic query/queries,
- 和should子句中的非拼音query/queries
如果您在问题中包含索引的映射和设置,我可以更新答案。
更新:解决方法
一个。扩展您的映射以对 title
和 content
:
"title": {
"type": "text",
"analyzer": "my_analyzer",
"fields" : {
"standard" : {
"type" : "text"
}
}
},
...
"content": {
"type": "text",
"analyzer": "my_analyzer"
"fields" : {
"standard" : {
"type" : "text"
}
}
},
乙。获取填充的字段(例如通过重新索引所有内容):
POST courts_2/_update_by_query
C。调整您的查询以利用新引入的字段:
GET courts_2/_search
{
"_source": ["title","bench","court","id_"],
"size": 10,
"query": {
"bool": {
"must": {
"multi_match": {
"query": "Abhijeet",
"fields": ["title", "content"]
}
},
"should": {
"multi_match": {
"query": "Abhijeet",
"fields": ["title.standard", "content.standard"]
}
}
}
},
"highlight": {
"fields": {
"title": {},
"content": {}
}
}
}