elasticsearch 忽略搜索重音

elasticsearch ignore accents on search

我有一个包含客户信息的 elasticsearch 索引

我在寻找带有重音符号的结果时遇到了一些问题

例如,我有 {name: 'anais'}{name: anaïs}

运行

GET /my-index/_search
{
  "size": 25, 
  "query": {
    "match": {"name": "anaïs"}
  }
}

对于此查询,我希望两者都相同,在这种情况下,我只有 anaïs

GET /my-index/_search
{
  "size": 25, 
  "query": {
    "match": {"name": "anais"}
  }
}

我想得到anaisanaïs,在这种情况下我只有anais

我尝试添加一个 analyser

PUT /my-new-celebrity/_settings
{
  "analysis": {
    "analyzer": {
      "default": {
        "type": "custom",
        "tokenizer": "standard",
        "filter": [
          "lowercase",
          "asciifolding"
        ]
      }
    }
  }
}

但在这种情况下,对于两次搜索,我只得到 anais

您似乎忘记在 name 字段上应用自定义 default 分析器,下面是工作示例:

带映射和设置的索引定义

{
    "settings": {
        "analysis": {
            "analyzer": {
                "default": {
                    "type": "custom",
                    "tokenizer": "standard",
                    "filter": [
                        "lowercase",
                        "asciifolding"
                    ]
                }
            }
        }
    },
    "mappings" : {
        "properties" :{
            "name" : {
                "type" : "text",
                "analyzer" : "default" // note this 
            }
        }
    }
}

索引示例文档

{
   "name" : "anais"
}

{
   "name" : "anaïs"
}

搜索查询与您的相同

{
    "size": 25,
    "query": {
        "match": {
            "name": "anaïs"
        }
    }
}

并且预期两个搜索结果

 "hits": [
            {
                "_index": "myindexascii",
                "_type": "_doc",
                "_id": "1",
                "_score": 0.18232156,
                "_source": {
                    "name": "anaïs"
                }
            },
            {
                "_index": "myindexascii",
                "_type": "_doc",
                "_id": "2",
                "_score": 0.18232156,
                "_source": {
                    "name": "anais"
                }
            }
        ]