Elasticsearch - 如何从词尾删除 s
Elasticsearch - how do I remove s from end of words
使用 Elasticsearch 2.2,作为一个简单的实验,我想从任何以小写字符 "s" 结尾的单词中删除最后一个字符。例如,单词 "sounds" 将被索引为 "sound".
我正在这样定义我的分析器:
{
"template": "document-index-template",
"settings": {
"number_of_shards": 1,
"analysis": {
"filter": {
"sFilter": {
"type": "pattern_replace",
"pattern": "([a-zA-Z]+)([s]( |$))",
"replacement": ""
}
},
"analyzer": {
"tight": {
"type": "standard",
"filter": [
"sFilter",
"lowercase"
]
}
}
}
}
}
然后,当我使用此请求分析术语 "sounds of silences" 时:
<index>/_analyze?analyzer=tight&text=sounds%20of%20silences
我得到:
{
"tokens": [
{
"token": "sounds",
"start_offset": 0,
"end_offset": 6,
"type": "<ALPHANUM>",
"position": 0
},
{
"token": "of",
"start_offset": 7,
"end_offset": 9,
"type": "<ALPHANUM>",
"position": 1
},
{
"token": "silences",
"start_offset": 10,
"end_offset": 18,
"type": "<ALPHANUM>",
"position": 2
}
]
}
我希望 "sounds" 成为 "sound" 并且 "silences" 成为 "silence"
上述分析器设置无效。我认为您打算使用的是 custom with tokenizer set to standard
类型的分析器
示例:
{
"settings": {
"number_of_shards": 1,
"analysis": {
"filter": {
"sFilter": {
"type": "pattern_replace",
"pattern": "([a-zA-Z]+)s",
"replacement": ""
}
},
"analyzer": {
"tight": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"sFilter"
]
}
}
}
}
}
使用 Elasticsearch 2.2,作为一个简单的实验,我想从任何以小写字符 "s" 结尾的单词中删除最后一个字符。例如,单词 "sounds" 将被索引为 "sound".
我正在这样定义我的分析器:
{
"template": "document-index-template",
"settings": {
"number_of_shards": 1,
"analysis": {
"filter": {
"sFilter": {
"type": "pattern_replace",
"pattern": "([a-zA-Z]+)([s]( |$))",
"replacement": ""
}
},
"analyzer": {
"tight": {
"type": "standard",
"filter": [
"sFilter",
"lowercase"
]
}
}
}
}
}
然后,当我使用此请求分析术语 "sounds of silences" 时:
<index>/_analyze?analyzer=tight&text=sounds%20of%20silences
我得到:
{
"tokens": [
{
"token": "sounds",
"start_offset": 0,
"end_offset": 6,
"type": "<ALPHANUM>",
"position": 0
},
{
"token": "of",
"start_offset": 7,
"end_offset": 9,
"type": "<ALPHANUM>",
"position": 1
},
{
"token": "silences",
"start_offset": 10,
"end_offset": 18,
"type": "<ALPHANUM>",
"position": 2
}
]
}
我希望 "sounds" 成为 "sound" 并且 "silences" 成为 "silence"
上述分析器设置无效。我认为您打算使用的是 custom with tokenizer set to standard
类型的分析器示例:
{
"settings": {
"number_of_shards": 1,
"analysis": {
"filter": {
"sFilter": {
"type": "pattern_replace",
"pattern": "([a-zA-Z]+)s",
"replacement": ""
}
},
"analyzer": {
"tight": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"sFilter"
]
}
}
}
}
}