Elasticsearch - 如何从词尾删除 s

Elasticsearch - how do I remove s from end of words

使用 Elasticsearch 2.2,作为一个简单的实验,我想从任何以小写字符 "s" 结尾的单词中删除最后一个字符。例如,单词 "sounds" 将被索引为 "sound".

我正在这样定义我的分析器:

{
  "template": "document-index-template",
  "settings": {
    "number_of_shards": 1,
    "analysis": {
      "filter": {
        "sFilter": {
          "type": "pattern_replace",
          "pattern": "([a-zA-Z]+)([s]( |$))",
          "replacement": ""
        }
      },
      "analyzer": {
        "tight": {
          "type": "standard",
          "filter": [
            "sFilter",
            "lowercase"
          ]
        }
      }
    }
  }
}

然后,当我使用此请求分析术语 "sounds of silences" 时:

<index>/_analyze?analyzer=tight&text=sounds%20of%20silences

我得到:

{
   "tokens": [
      {
         "token": "sounds",
         "start_offset": 0,
         "end_offset": 6,
         "type": "<ALPHANUM>",
         "position": 0
      },
      {
         "token": "of",
         "start_offset": 7,
         "end_offset": 9,
         "type": "<ALPHANUM>",
         "position": 1
      },
      {
         "token": "silences",
         "start_offset": 10,
         "end_offset": 18,
         "type": "<ALPHANUM>",
         "position": 2
      }
   ]
}

我希望 "sounds" 成为 "sound" 并且 "silences" 成为 "silence"

上述分析器设置无效。我认为您打算使用的是 custom with tokenizer set to standard

类型的分析器

示例:

{

  "settings": {
    "number_of_shards": 1,
    "analysis": {
      "filter": {
        "sFilter": {
          "type": "pattern_replace",
          "pattern": "([a-zA-Z]+)s",
          "replacement": ""
        }
      },
      "analyzer": {
        "tight": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": [
            "sFilter"
          ]
        }
      }
    }
  }
}