我怎样才能准确我的 ElasticSearch 查询以更好地区分它的结果?

How can I accurate my ElasticSeach query to distingue better its results?

我的挑战是创建一个自动完成字段(django 和 ES),我可以在其中搜索“apeni”、“rua apen”或“roa apen”,并将“rua apeninos”作为主要(或独特的) ) 选项。我已经在 ES 中尝试过 suggest 和 completion,但都使用前缀(不要使用“apen”)。我也尝试了通配符,但无法使用模糊(不适用于“roa apini”或“apini”)。所以,现在我正在用模糊匹配。

但即使查询词不同,如“rua ape”或“rua apot”,returns 相同的两个文档 street_desc 等于“rua apeninos”和“rua apotribu”并且都是 1.0.

查询:

{
   "aggs":{
      "addresses":{
         "filters":{
            "filters":{
               "street":{
                  "match":{
                     "street_desc":{
                        "query":"rua ape",
                        "fuzziness":"AUTO",
                        "prefix_length":0,
                        "max_expansions":50
                     }
                  }
               }
            }
         },
         "aggs":{
            "street_bucket":{
               "significant_terms":{
                  "field":"street_desc.raw",
                  "size":3
               }
            }
         }
      }
   },
   "sort":[
      {
         "_score":{
            "order":"desc"
         }
      }
   ]
}

索引:

{
   "catalogs":{
      "mappings":{
         "properties":{
            "street_desc":{
               "type":"text",
               "fields":{
                  "raw":{
                     "type":"keyword"
                  }
               },
               "analyzer":"suggest_analyzer"
            }
         }
      }
   }
}

分析器: (python)

suggest_analyzer = analyzer(
    'suggest_analyzer',
    tokenizer=tokenizer("lowercase"),
    filter=[token_filter('stopbr', 'stop', stopwords="_brazilian_")],
    language="brazilian",
    char_filter=["html_strip"]
)

添加一个端到端的工作示例,我对所有给定的搜索词进行了测试。

索引映射

{
  "settings": {
    "analysis": {
      "filter": {
        "autocomplete_filter": {
          "type": "edge_ngram",
          "min_gram": 1,
          "max_gram": 10
        }
      },
      "analyzer": {
        "autocomplete": { 
          "type": "custom",
          "tokenizer": "standard",
          "filter": [
            "lowercase",
            "autocomplete_filter"
          ]
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "title": {
        "type": "text",
        "analyzer": "autocomplete", 
        "search_analyzer": "standard" 
      }
    }
  }
}

索引示例文档

{
   "title" : "rua apotribu"
}

{
   "title" : "rua apeninos"
}

搜索查询

{
    "query": {
        "match": {
            "title": {
                "query": "apeni", // 
                "fuzziness":"AUTO"
            }
        }
    }
}

和搜索结果

  "hits": [
            {
                "_index": "64881760",
                "_type": "_doc",
                "_id": "1",
                "_score": 1.1026623,
                "_source": {
                    "title": "rua apeninos"
                }
            }
        ]

现在 apen 也提供搜索结果

 "hits": [
            {
                "_index": "64881760",
                "_type": "_doc",
                "_id": "1",
                "_score": 2.517861,
                "_source": {
                    "title": "rua apeninos"
                }
            }
        ]

现在,当查询词不同时,如 rua apot,它会将得分更高的两个文档都带到 rua apotribu,如下面的搜索结果所示。

 "hits": [
            {
                "_index": "64881760",
                "_type": "_doc",
                "_id": "2",
                "_score": 2.9289336,
                "_source": {
                    "title": "rua apotribu"
                }
            },
            {
                "_index": "64881760",
                "_type": "_doc",
                "_id": "1",
                "_score": 0.41107285,
                "_source": {
                    "title": "rua apeninos"
                }
            }
        ]