如何在 elasticsearch 中结合模式分析器和 char_filter
How to combine a pattern analyzer and char_filter in elasticsearch
我有一个要标记化的关键字字段(以逗号分隔),但它也可能包含带有“+”字符的值。例如:
query_string.keywords = Living,Music,+concerts+and+live+bands,News,Portland
创建索引时,以下内容可以很好地以逗号分隔关键字:
{
"settings": {
"number_of_shards": 5,
"analysis": {
"analyzer": {
"happy_tokens": {
"type": "pattern",
"pattern": "([,]+)"
}
}
}
},
"mappings": {
"post" : {
"properties" : {
"query_string.keywords" : {
"type": "string",
"analyzer" : "happy_tokens"
}
}
}
}
}
如何向其中添加 char_filter(见下文)以将 + 更改为空格或空字符串?
"char_filter": {
"kill_pluses": {
"type": "pattern_replace",
"pattern": "+",
"replace": ""
}
}
您需要转义“+”,因为“+”在正则表达式中有一个 special meaning。
"char_filter": {
"kill_pluses": {
"type": "pattern_replace",
"pattern": "\+",
"replace": ""
}
}
我发现 "mapping" char_filter
可以将我的加号转换为 space。标记化后,我能够 trim
标记删除白色 space。
custom analyzers page in the elasticsearch guide帮了大忙。
我的工作示例如下:
{
"settings": {
"number_of_shards": 5,
"index": {
"analysis": {
"char_filter": {
"plus_to_space": {
"type": "mapping",
"mappings": ["+=>\u0020"]
}
},
"tokenizer": {
"split_on_comma": {
"type": "pattern",
"pattern": "([,]+)"
}
},
"analyzer": {
"happy_tokens": {
"type": "custom",
"char_filter": ["plus_to_space"],
"tokenizer": "split_on_comma",
"filter": ["trim"]
}
}
}
}
},
"mappings": {
"post" : {
"properties" : {
"query_string.keywords" : {
"type": "string",
"analyzer" : "happy_tokens"
}
}
}
}
}
我有一个要标记化的关键字字段(以逗号分隔),但它也可能包含带有“+”字符的值。例如:
query_string.keywords = Living,Music,+concerts+and+live+bands,News,Portland
创建索引时,以下内容可以很好地以逗号分隔关键字:
{
"settings": {
"number_of_shards": 5,
"analysis": {
"analyzer": {
"happy_tokens": {
"type": "pattern",
"pattern": "([,]+)"
}
}
}
},
"mappings": {
"post" : {
"properties" : {
"query_string.keywords" : {
"type": "string",
"analyzer" : "happy_tokens"
}
}
}
}
}
如何向其中添加 char_filter(见下文)以将 + 更改为空格或空字符串?
"char_filter": {
"kill_pluses": {
"type": "pattern_replace",
"pattern": "+",
"replace": ""
}
}
您需要转义“+”,因为“+”在正则表达式中有一个 special meaning。
"char_filter": {
"kill_pluses": {
"type": "pattern_replace",
"pattern": "\+",
"replace": ""
}
}
我发现 "mapping" char_filter
可以将我的加号转换为 space。标记化后,我能够 trim
标记删除白色 space。
custom analyzers page in the elasticsearch guide帮了大忙。
我的工作示例如下:
{
"settings": {
"number_of_shards": 5,
"index": {
"analysis": {
"char_filter": {
"plus_to_space": {
"type": "mapping",
"mappings": ["+=>\u0020"]
}
},
"tokenizer": {
"split_on_comma": {
"type": "pattern",
"pattern": "([,]+)"
}
},
"analyzer": {
"happy_tokens": {
"type": "custom",
"char_filter": ["plus_to_space"],
"tokenizer": "split_on_comma",
"filter": ["trim"]
}
}
}
}
},
"mappings": {
"post" : {
"properties" : {
"query_string.keywords" : {
"type": "string",
"analyzer" : "happy_tokens"
}
}
}
}
}