Elasticsearch 自定义分析器被忽略

Question

我正在使用 Elasticsearch 2.2.0，我正在尝试在字段上使用 lowercase + asciifolding 过滤器。

这是http://localhost:9200/myindex/

的输出

{
    "myindex": {
        "aliases": {}, 
        "mappings": {
            "products": {
                "properties": {
                    "fold": {
                        "analyzer": "folding", 
                        "type": "string"
                    }
                }
            }
        }, 
        "settings": {
            "index": {
                "analysis": {
                    "analyzer": {
                        "folding": {
                            "token_filters": [
                                "lowercase", 
                                "asciifolding"
                            ], 
                            "tokenizer": "standard", 
                            "type": "custom"
                        }
                    }
                }, 
                "creation_date": "1456180612715", 
                "number_of_replicas": "1", 
                "number_of_shards": "5", 
                "uuid": "vBMZEasPSAyucXICur3GVA", 
                "version": {
                    "created": "2020099"
                }
            }
        }, 
        "warmers": {}
    }
}

当我尝试使用 _analyze API 测试 folding 自定义过滤器时，这就是我得到的 http://localhost:9200/myindex/_analyze?analyzer=folding&text=%C3%89sta%20est%C3%A1%20loca[=24= 的输出]

{
    "tokens": [
        {
            "end_offset": 4, 
            "position": 0, 
            "start_offset": 0, 
            "token": "Ésta", 
            "type": "<ALPHANUM>"
        }, 
        {
            "end_offset": 9, 
            "position": 1, 
            "start_offset": 5, 
            "token": "está", 
            "type": "<ALPHANUM>"
        }, 
        {
            "end_offset": 14, 
            "position": 2, 
            "start_offset": 10, 
            "token": "loca", 
            "type": "<ALPHANUM>"
        }
    ]
}

如您所见，返回的标记是：Ésta、está、loca 而不是 esta， esta、loca。这是怎么回事？好像这个折叠分析器被忽略了。

Answer 1

创建索引时看起来像是一个简单的错字。

在你的 "analysis":{"analyzer":{...}} 块中，这个：

"token_filters": [...]

应该是

"filter": [...]

检查 the documentation 以确认这一点。因为你的过滤器数组没有正确命名，所以 ES 完全忽略了它，只是决定使用 standard 分析器。这是一个使用 Sense chrome 插件编写的小示例。按顺序执行：

DELETE /test

PUT /test
{
      "analysis": {
         "analyzer": {
            "folding": {
               "type": "custom",
               "filter": [
                  "lowercase",
                  "asciifolding"
               ],
               "tokenizer": "standard"
            }
         }
      }
}

GET /test/_analyze
{
    "analyzer":"folding",
    "text":"Ésta está loca"
}

和最后GET /test/_analyze的结果：

"tokens": [
      {
         "token": "esta",
         "start_offset": 0,
         "end_offset": 4,
         "type": "<ALPHANUM>",
         "position": 0
      },
      {
         "token": "esta",
         "start_offset": 5,
         "end_offset": 9,
         "type": "<ALPHANUM>",
         "position": 1
      },
      {
         "token": "loca",
         "start_offset": 10,
         "end_offset": 14,
         "type": "<ALPHANUM>",
         "position": 2
      }
   ]

Elasticsearch 自定义分析器被忽略

Elasticsearch custom analyzer being ignored

analyzer

elasticsearch