如何让 elasticsearch 忽略某些查询之间的空格？

Question

我的 elasticsearch 文档有一个字段 Name，其中的条目如下：

Samsung Galaxy S3
Samsung Galaxy Ace Duos 3
Samsung Galaxy Duos 3
Samsung Galaxy S2
Samsung Galaxy S (I9000)

在使用以下查询查询此字段时 （注意 "s" 和“3”之间的 space）:

{
  "query": {
    "match": {
      "Name": {
        "query": "galaxy s 3",
        "fuzziness": 2,
        "prefix_length": 1
      }
    }
  }
}

它 returns "Samsung Galaxy Duos 3" 作为相关结果，而不是 "Samsung Galaxy S3"。

我注意到这种任务的模式是忽略任何数字和任何单个字母字符之间的 space，并进行查询。例如，"I-phone 5s" 也应该由 "I-phone 5 s".

返回

有什么好的方法可以做到这一点吗？

Answer 1

您需要更改分析器以在从文本更改为数字时拆分字符串 - 使用正则表达式会有所帮助（这基于 camelcase analyser）：

curl -XPUT 'localhost:9200/myindex/' -d '
     {
         "settings":{
             "analysis": {
                 "analyzer": {
                     "mynewanalyser":{
                         "type": "pattern",
                         "pattern":"([^\p{L}\d]+)|(?<=\D)(?=\d)|(?<=\d)(?=\D)"
                     }
                 }
             }
         }
     }'

正在使用您的字符串测试新分析器：

-XGET 'localhost:9200/myindex/_analyze?analyzer=mynewanalyser&pretty' -d 'Samsung Galaxy S3'
{
  "tokens" : [ {
    "token" : "samsung",
    "start_offset" : 0,
    "end_offset" : 7,
    "type" : "word",
    "position" : 1
  }, {
    "token" : "galaxy",
    "start_offset" : 8,
    "end_offset" : 14,
    "type" : "word",
    "position" : 2
  }, {
    "token" : "s",
    "start_offset" : 15,
    "end_offset" : 16,
    "type" : "word",
    "position" : 3
  }, {
    "token" : "3",
    "start_offset" : 16,
    "end_offset" : 17,
    "type" : "word",
    "position" : 4
  } ]
}

如何让 elasticsearch 忽略某些查询之间的空格？

How to make elasticsearch disregard spaces between certain queries?

query-analyzer

elasticsearch