我怎样才能准确我的 ElasticSearch 查询以更好地区分它的结果？

Question

我的挑战是创建一个自动完成字段（django 和 ES），我可以在其中搜索“apeni”、“rua apen”或“roa apen”，并将“rua apeninos”作为主要（或独特的））选项。我已经在 ES 中尝试过 suggest 和 completion，但都使用前缀（不要使用“apen”）。我也尝试了通配符，但无法使用模糊（不适用于“roa apini”或“apini”）。所以，现在我正在用模糊匹配。

但即使查询词不同，如“rua ape”或“rua apot”，returns 相同的两个文档 street_desc 等于“rua apeninos”和“rua apotribu”并且都是 1.0.

查询：

{
   "aggs":{
      "addresses":{
         "filters":{
            "filters":{
               "street":{
                  "match":{
                     "street_desc":{
                        "query":"rua ape",
                        "fuzziness":"AUTO",
                        "prefix_length":0,
                        "max_expansions":50
                     }
                  }
               }
            }
         },
         "aggs":{
            "street_bucket":{
               "significant_terms":{
                  "field":"street_desc.raw",
                  "size":3
               }
            }
         }
      }
   },
   "sort":[
      {
         "_score":{
            "order":"desc"
         }
      }
   ]
}

索引：

{
   "catalogs":{
      "mappings":{
         "properties":{
            "street_desc":{
               "type":"text",
               "fields":{
                  "raw":{
                     "type":"keyword"
                  }
               },
               "analyzer":"suggest_analyzer"
            }
         }
      }
   }
}

分析器： (python)

suggest_analyzer = analyzer(
    'suggest_analyzer',
    tokenizer=tokenizer("lowercase"),
    filter=[token_filter('stopbr', 'stop', stopwords="_brazilian_")],
    language="brazilian",
    char_filter=["html_strip"]
)

Answer 1

添加一个端到端的工作示例，我对所有给定的搜索词进行了测试。

索引映射

{
  "settings": {
    "analysis": {
      "filter": {
        "autocomplete_filter": {
          "type": "edge_ngram",
          "min_gram": 1,
          "max_gram": 10
        }
      },
      "analyzer": {
        "autocomplete": { 
          "type": "custom",
          "tokenizer": "standard",
          "filter": [
            "lowercase",
            "autocomplete_filter"
          ]
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "title": {
        "type": "text",
        "analyzer": "autocomplete", 
        "search_analyzer": "standard" 
      }
    }
  }
}

索引示例文档

{
   "title" : "rua apotribu"
}

{
   "title" : "rua apeninos"
}

搜索查询

{
    "query": {
        "match": {
            "title": {
                "query": "apeni", // 
                "fuzziness":"AUTO"
            }
        }
    }
}

和搜索结果

  "hits": [
            {
                "_index": "64881760",
                "_type": "_doc",
                "_id": "1",
                "_score": 1.1026623,
                "_source": {
                    "title": "rua apeninos"
                }
            }
        ]

现在 apen 也提供搜索结果

 "hits": [
            {
                "_index": "64881760",
                "_type": "_doc",
                "_id": "1",
                "_score": 2.517861,
                "_source": {
                    "title": "rua apeninos"
                }
            }
        ]

现在，当查询词不同时，如 rua apot，它会将得分更高的两个文档都带到 rua apotribu，如下面的搜索结果所示。

 "hits": [
            {
                "_index": "64881760",
                "_type": "_doc",
                "_id": "2",
                "_score": 2.9289336,
                "_source": {
                    "title": "rua apotribu"
                }
            },
            {
                "_index": "64881760",
                "_type": "_doc",
                "_id": "1",
                "_score": 0.41107285,
                "_source": {
                    "title": "rua apeninos"
                }
            }
        ]

我怎样才能准确我的 ElasticSearch 查询以更好地区分它的结果？

How can I accurate my ElasticSeach query to distingue better its results?

full-text-search

elasticsearch