精确匹配和模糊...有什么好方法?
Exact match and fuzziness...What is the good way?
我花了很多时间试图找到创建和自动完成支持多语言城市搜索的最佳方法。 (ES/EN),模糊并获得精确匹配的优先级(在结果顶部显示)但我找不到完成此任务的好方法。
我当前的解决方案在很多情况下都非常有效,但是当我为 Roma 查找时,第一个选项是 "Iasi-East Romania, romania" 而 Roma italy 是 thirty 函数(完全匹配)
结果Json:
[{"_index":"destinations","_type":"doc","_id":"_X80XWcBn2nzTu98N7_F","_score":75.50012,"_source":{"destination_name_en":"Iasi-East Romania","destination_name_es":"Iasi-East Romania","destination_name_pt":"Iasi-East Romania","country_code":"RO","country_name":"ROMANIA","destination_id":7953,"popularity":"0"}},{"_index":"destinations","_type":"doc","_id":"7380XWcBn2nzTu98OMZl","_score":73.116455,"_source":{"destination_name_en":"La Romana","destination_name_es":"La Romana","destination_name_pt":"La Romana","country_code":"DO","country_name":"DOMINICAN REPUBLIC","destination_id":2816,"popularity":"0"}},{"_index":"destinations","_type":"doc","_id":"1X80XWcBn2nzTu98OMZl","_score":71.4391,"_source":{"_index":"destinations","_type":"doc","_id":"8H80XWcBn2nzTu98OMZl","_score":52.018818,"_source":{"destination_name_en":"Rome","destination_name_es":"Roma","destination_name_pt":"Roma","country_code":"IT","country_name":"ITALY","destination_id":6338,"popularity":"0"}}]
现在这是我最好的解决方案..
映射:
'settings' => [
'analysis' => [
'filter' => [
'autocomplete_filter' => [
"type"=> "edge_ngram",
"min_gram"=> 1,
"max_gram"=> 20,
]
],
'analyzer' => [
'autocomplete' => [
"type" => "custom",
'tokenizer' => "standard",
'filter' => ['lowercase', 'asciifolding', 'autocomplete_filter'],
]
],
],
],
'mappings' =>[
'doc' => [
"properties"=> [
"destination_name_en"=> [
"type"=> "text",
"analyzer"=> "autocomplete",
"search_analyzer"=> "standard",
],
"destination_name_es"=> [
"type"=> "text",
"analyzer"=> "autocomplete",
"search_analyzer"=> "standard",
],
"destination_name_pt"=> [
"type"=> "text",
"analyzer"=> "autocomplete",
"search_analyzer"=> "standard",
],
"popularity"=> [
"type"=> "integer",
]
]
]
]
搜索:
'query' => [
"bool" => [
"should" => [
[
"multi_match"=>[
"query"=>$text,
"fields"=>[
"destination_name_*"
],
"type"=>"most_fields",
"boost" => 2
]
],
[
"multi_match"=>[
"query"=>$text,
"fields"=>[
"destination_name_*"
],
"fuzziness" => "1",
"prefix_length"=> 2
]
]
]
]
]
此外,我想使用她的人气值增加对特定目的地的提升。
我希望有人能以示例或方向指导我。
非常感谢
问题是,当您搜索 roma
时,Iasi-East Romania 是第一个结果,因为它包含所有语言的 roma。但是 roma
仅匹配 ES/PT/IT 中的 Rome
而不是 EN。
因此,如果您想提高精确匹配,您需要在没有自动完成的情况下在另一个字段中为您的城市名称编制索引(适用于所有语言),并在这些字段的 should 中添加一个新子句。
映射示例:
"properties"=> [
"destination_name_en"=> [
"type"=> "text",
"analyzer"=> "autocomplete",
"search_analyzer"=> "standard",
"fields": => [
"exact" => [
"type"=> "text",
"analyzer"=> "standard", // you could use a more fancy analyzer here
]
]
],
....
并在查询中:
'query' => [
"bool" => [
"should" => [
[
"multi_match"=>[
"query"=>$text,
"fields"=>[
"destination_name_*"
],
"type"=>"most_fields",
"boost" => 2
]
],
[
"multi_match"=>[
"query"=>$text,
"fields"=>[
"destination_name_*"
],
"fuzziness" => "1",
"prefix_length"=> 2
]
],
[
"multi_match"=>[
"query"=>$text,
"type"=>"most_fields"
"fields"=>[
"destination_name_*.exact"
],
"boost" => 2
]
]
]
]
]
你能试试类似的东西并告诉我们吗?
这项工作很有魅力!。现在我可以在第一个结果中获得 rome 并且在单词结尾处也可以接受错误。罗米 return 也是罗马的第一名。
现在我正在尝试通过人气字段来提升结果(我有两个罗马,意大利罗马和澳大利亚罗马),我想提升世界上一些受欢迎的城市。
我正在使用函数 score 但这给我带来了非常奇怪的结果。
这是我当前的代码:
'query' => [
'function_score' => [
'field_value_factor' => [
'field' => 'popularity',
],
"score_mode" => "multiply",
'query' => [
"bool" => [
"should" => [
[
"multi_match"=>[
"query"=>$text,
"fields"=>[
"destination_name_*"
],
"type"=>"most_fields",
"boost" => 2
]
],
[
"multi_match"=>[
"query"=>$text,
"fields"=>[
"destination_name_*"
],
"fuzziness" => "1",
"prefix_length"=> 2
]
],
[
"multi_match"=>[
"query"=>$text,
"fields"=>[
"destination_name_*.exact"
],
"boost" => 2
]
]
]
]
]
],
],
有什么建议吗?
PD:非常感谢您的帮助。从现在开始我给你最好的答案因为你已经解决了主要问题
我花了很多时间试图找到创建和自动完成支持多语言城市搜索的最佳方法。 (ES/EN),模糊并获得精确匹配的优先级(在结果顶部显示)但我找不到完成此任务的好方法。
我当前的解决方案在很多情况下都非常有效,但是当我为 Roma 查找时,第一个选项是 "Iasi-East Romania, romania" 而 Roma italy 是 thirty 函数(完全匹配)
结果Json:
[{"_index":"destinations","_type":"doc","_id":"_X80XWcBn2nzTu98N7_F","_score":75.50012,"_source":{"destination_name_en":"Iasi-East Romania","destination_name_es":"Iasi-East Romania","destination_name_pt":"Iasi-East Romania","country_code":"RO","country_name":"ROMANIA","destination_id":7953,"popularity":"0"}},{"_index":"destinations","_type":"doc","_id":"7380XWcBn2nzTu98OMZl","_score":73.116455,"_source":{"destination_name_en":"La Romana","destination_name_es":"La Romana","destination_name_pt":"La Romana","country_code":"DO","country_name":"DOMINICAN REPUBLIC","destination_id":2816,"popularity":"0"}},{"_index":"destinations","_type":"doc","_id":"1X80XWcBn2nzTu98OMZl","_score":71.4391,"_source":{"_index":"destinations","_type":"doc","_id":"8H80XWcBn2nzTu98OMZl","_score":52.018818,"_source":{"destination_name_en":"Rome","destination_name_es":"Roma","destination_name_pt":"Roma","country_code":"IT","country_name":"ITALY","destination_id":6338,"popularity":"0"}}]
现在这是我最好的解决方案..
映射:
'settings' => [
'analysis' => [
'filter' => [
'autocomplete_filter' => [
"type"=> "edge_ngram",
"min_gram"=> 1,
"max_gram"=> 20,
]
],
'analyzer' => [
'autocomplete' => [
"type" => "custom",
'tokenizer' => "standard",
'filter' => ['lowercase', 'asciifolding', 'autocomplete_filter'],
]
],
],
],
'mappings' =>[
'doc' => [
"properties"=> [
"destination_name_en"=> [
"type"=> "text",
"analyzer"=> "autocomplete",
"search_analyzer"=> "standard",
],
"destination_name_es"=> [
"type"=> "text",
"analyzer"=> "autocomplete",
"search_analyzer"=> "standard",
],
"destination_name_pt"=> [
"type"=> "text",
"analyzer"=> "autocomplete",
"search_analyzer"=> "standard",
],
"popularity"=> [
"type"=> "integer",
]
]
]
]
搜索:
'query' => [
"bool" => [
"should" => [
[
"multi_match"=>[
"query"=>$text,
"fields"=>[
"destination_name_*"
],
"type"=>"most_fields",
"boost" => 2
]
],
[
"multi_match"=>[
"query"=>$text,
"fields"=>[
"destination_name_*"
],
"fuzziness" => "1",
"prefix_length"=> 2
]
]
]
]
]
此外,我想使用她的人气值增加对特定目的地的提升。
我希望有人能以示例或方向指导我。
非常感谢
问题是,当您搜索 roma
时,Iasi-East Romania 是第一个结果,因为它包含所有语言的 roma。但是 roma
仅匹配 ES/PT/IT 中的 Rome
而不是 EN。
因此,如果您想提高精确匹配,您需要在没有自动完成的情况下在另一个字段中为您的城市名称编制索引(适用于所有语言),并在这些字段的 should 中添加一个新子句。
映射示例:
"properties"=> [
"destination_name_en"=> [
"type"=> "text",
"analyzer"=> "autocomplete",
"search_analyzer"=> "standard",
"fields": => [
"exact" => [
"type"=> "text",
"analyzer"=> "standard", // you could use a more fancy analyzer here
]
]
],
....
并在查询中:
'query' => [
"bool" => [
"should" => [
[
"multi_match"=>[
"query"=>$text,
"fields"=>[
"destination_name_*"
],
"type"=>"most_fields",
"boost" => 2
]
],
[
"multi_match"=>[
"query"=>$text,
"fields"=>[
"destination_name_*"
],
"fuzziness" => "1",
"prefix_length"=> 2
]
],
[
"multi_match"=>[
"query"=>$text,
"type"=>"most_fields"
"fields"=>[
"destination_name_*.exact"
],
"boost" => 2
]
]
]
]
]
你能试试类似的东西并告诉我们吗?
这项工作很有魅力!。现在我可以在第一个结果中获得 rome 并且在单词结尾处也可以接受错误。罗米 return 也是罗马的第一名。
现在我正在尝试通过人气字段来提升结果(我有两个罗马,意大利罗马和澳大利亚罗马),我想提升世界上一些受欢迎的城市。
我正在使用函数 score 但这给我带来了非常奇怪的结果。
这是我当前的代码:
'query' => [
'function_score' => [
'field_value_factor' => [
'field' => 'popularity',
],
"score_mode" => "multiply",
'query' => [
"bool" => [
"should" => [
[
"multi_match"=>[
"query"=>$text,
"fields"=>[
"destination_name_*"
],
"type"=>"most_fields",
"boost" => 2
]
],
[
"multi_match"=>[
"query"=>$text,
"fields"=>[
"destination_name_*"
],
"fuzziness" => "1",
"prefix_length"=> 2
]
],
[
"multi_match"=>[
"query"=>$text,
"fields"=>[
"destination_name_*.exact"
],
"boost" => 2
]
]
]
]
]
],
],
有什么建议吗?
PD:非常感谢您的帮助。从现在开始我给你最好的答案因为你已经解决了主要问题