精确匹配和模糊...有什么好方法？

Question

我花了很多时间试图找到创建和自动完成支持多语言城市搜索的最佳方法。 (ES/EN)，模糊并获得精确匹配的优先级（在结果顶部显示）但我找不到完成此任务的好方法。

我当前的解决方案在很多情况下都非常有效，但是当我为 Roma 查找时，第一个选项是 "Iasi-East Romania, romania" 而 Roma italy 是 thirty 函数（完全匹配）

结果Json:

[{"_index":"destinations","_type":"doc","_id":"_X80XWcBn2nzTu98N7_F","_score":75.50012,"_source":{"destination_name_en":"Iasi-East Romania","destination_name_es":"Iasi-East Romania","destination_name_pt":"Iasi-East Romania","country_code":"RO","country_name":"ROMANIA","destination_id":7953,"popularity":"0"}},{"_index":"destinations","_type":"doc","_id":"7380XWcBn2nzTu98OMZl","_score":73.116455,"_source":{"destination_name_en":"La Romana","destination_name_es":"La Romana","destination_name_pt":"La Romana","country_code":"DO","country_name":"DOMINICAN REPUBLIC","destination_id":2816,"popularity":"0"}},{"_index":"destinations","_type":"doc","_id":"1X80XWcBn2nzTu98OMZl","_score":71.4391,"_source":{"_index":"destinations","_type":"doc","_id":"8H80XWcBn2nzTu98OMZl","_score":52.018818,"_source":{"destination_name_en":"Rome","destination_name_es":"Roma","destination_name_pt":"Roma","country_code":"IT","country_name":"ITALY","destination_id":6338,"popularity":"0"}}]

现在这是我最好的解决方案..

映射：

'settings' => [ 
                'analysis' => [     
                    'filter' => [
                        'autocomplete_filter' => [
                            "type"=> "edge_ngram",
                            "min_gram"=> 1,
                            "max_gram"=> 20,

                        ]
                    ],
                    'analyzer' => [
                        'autocomplete' => [
                            "type" => "custom",
                            'tokenizer' => "standard",
                            'filter' => ['lowercase', 'asciifolding', 'autocomplete_filter'],
                        ]
                    ],

                ],   
            ],
            'mappings' =>[
                'doc' => [
                    "properties"=> [
                        "destination_name_en"=> [
                           "type"=> "text",
                           "analyzer"=> "autocomplete",
                           "search_analyzer"=> "standard",

                        ],
                        "destination_name_es"=> [
                           "type"=> "text",
                           "analyzer"=> "autocomplete",
                           "search_analyzer"=> "standard",
                        ],
                        "destination_name_pt"=> [
                           "type"=> "text",
                           "analyzer"=> "autocomplete",
                           "search_analyzer"=> "standard",
                        ],
                        "popularity"=> [
                           "type"=> "integer",
                        ]
                    ]
                ]
            ]

搜索：

'query' => [
                "bool" => [
                    "should" => [   
                         [
                            "multi_match"=>[
                                "query"=>$text,
                                "fields"=>[
                                   "destination_name_*"
                                ],
                                "type"=>"most_fields",
                                "boost" => 2
                            ]
                        ],
                        [
                            "multi_match"=>[
                                "query"=>$text,
                                "fields"=>[
                                   "destination_name_*"
                                ],
                                "fuzziness" => "1",
                                "prefix_length"=> 2                                   
                            ]
                        ]
                    ]
                ]
            ]

此外，我想使用她的人气值增加对特定目的地的提升。

我希望有人能以示例或方向指导我。

非常感谢

Answer 1

问题是，当您搜索 roma 时，Iasi-East Romania 是第一个结果，因为它包含所有语言的 roma。但是 roma 仅匹配 ES/PT/IT 中的 Rome 而不是 EN。

因此，如果您想提高精确匹配，您需要在没有自动完成的情况下在另一个字段中为您的城市名称编制索引（适用于所有语言），并在这些字段的 should 中添加一个新子句。

映射示例：

 "properties"=> [
        "destination_name_en"=> [
                "type"=> "text",
                "analyzer"=> "autocomplete",
                "search_analyzer"=> "standard",
                "fields": => [
                    "exact" => [
                        "type"=> "text",
                        "analyzer"=> "standard", // you could use a more fancy analyzer here
                    ]

                ]
        ],
....

并在查询中：

'query' => [
                "bool" => [
                    "should" => [   
                         [
                            "multi_match"=>[
                                "query"=>$text,
                                "fields"=>[
                                   "destination_name_*"
                                ],
                                "type"=>"most_fields",
                                "boost" => 2
                            ]
                        ],
                        [
                            "multi_match"=>[
                                "query"=>$text,
                                "fields"=>[
                                   "destination_name_*"
                                ],
                                "fuzziness" => "1",
                                "prefix_length"=> 2                                   
                            ]
                        ],
                        [
                            "multi_match"=>[
                                "query"=>$text,
                                "type"=>"most_fields" 
                                "fields"=>[
                                   "destination_name_*.exact"
                                ],
                                "boost" => 2 
                            ]
                        ]
                    ]
                ]
            ]

你能试试类似的东西并告诉我们吗？

Answer 2

这项工作很有魅力！。现在我可以在第一个结果中获得 rome 并且在单词结尾处也可以接受错误。罗米 return 也是罗马的第一名。

现在我正在尝试通过人气字段来提升结果（我有两个罗马，意大利罗马和澳大利亚罗马），我想提升世界上一些受欢迎的城市。

我正在使用函数 score 但这给我带来了非常奇怪的结果。

这是我当前的代码：

'query' => [
                'function_score' => [
                    'field_value_factor' => [
                        'field' => 'popularity',
                    ],
                    "score_mode" => "multiply",
                    'query' => [
                        "bool" => [
                            "should" => [   
                                 [
                                    "multi_match"=>[
                                        "query"=>$text,
                                        "fields"=>[
                                           "destination_name_*"
                                        ],
                                        "type"=>"most_fields",
                                        "boost" => 2
                                    ]
                                ],
                                [
                                    "multi_match"=>[
                                        "query"=>$text,
                                        "fields"=>[
                                           "destination_name_*"
                                        ],
                                        "fuzziness" => "1",
                                        "prefix_length"=> 2                                   
                                    ]
                                ],
                                [
                                    "multi_match"=>[
                                        "query"=>$text,
                                        "fields"=>[
                                           "destination_name_*.exact"
                                        ],
                                        "boost" => 2                                   
                                    ]
                                ]
                            ]
                        ]
                    ]
                ],
            ],

有什么建议吗？

PD：非常感谢您的帮助。从现在开始我给你最好的答案因为你已经解决了主要问题

精确匹配和模糊...有什么好方法？

Exact match and fuzziness...What is the good way?

fuzzy-search

autocomplete

exact-match

elasticsearch