如何通过在 Kibana 中摄取管道将字段拆分为单词
How split a field to words by ingest pipeline in Kibana
我创建了一个摄取管道,如下所示,将一个字段拆分为单词:
POST _ingest/pipeline/_simulate
{
"pipeline": {
"description": "String cutting processing",
"processors": [
{
"split": {
"field": "foo",
"separator": "|"
}
}
]
},
"docs": [
{
"_source": {
"foo": "apple|time"
}
}
]
}
但它将字段拆分为字符:
{
"docs" : [
{
"doc" : {
"_index" : "_index",
"_type" : "_doc",
"_id" : "_id",
"_source" : {
"foo" : [
"a",
"p",
"p",
"l",
"e",
"|",
"t",
"i",
"m",
"e"
]
}
}
}
]
}
如果我用逗号替换分隔符,相同的管道将字段拆分为单词:
POST _ingest/pipeline/_simulate
{
"pipeline": {
"description": "String cutting processing",
"processors": [
{
"split": {
"field": "foo",
"separator": ","
}
}
]
},
"docs": [
{
"_source": {
"foo": "apple,time"
}
}
]
}
那么输出将是:
{
"docs" : [
{
"doc" : {
"_index" : "_index",
"_type" : "_doc",
"_id" : "_id",
"_source" : {
"foo" : [
"apple",
"time"
]
}
}
}
]
}
当分隔符为“|”时,如何将字段拆分为单词?
我的下一个问题是如何将此摄取管道应用于现有索引?
我试过 ,但它对我不起作用。
编辑
这是包含将两个部分分配给两列的文档的整个管道:
POST _ingest/pipeline/_simulate
{
"pipeline": {
"description": """combined fields are text that contain "|" to separate two fields""",
"processors": [
{
"split": {
"field": "dv_m",
"separator": "|",
"target_field": "dv_m_splited"
}
},
{
"set": {
"field": "dv_metric_prod",
"value": "{{dv_m_splited.1}}",
"override": false
}
},
{
"set": {
"field": "dv_metric_section",
"value": "{{dv_m_splited.2}}",
"override": false
}
}
]
},
"docs": [
{
"_source": {
"dv_m": "amaze_inc|Understanding"
}
}
]
}
生成此响应:
{
"docs" : [
{
"doc" : {
"_index" : "_index",
"_type" : "_doc",
"_id" : "_id",
"_source" : {
"dv_metric_prod" : "m",
"dv_m_splited" : [
"a",
"m",
"a",
"z",
"e",
"_",
"i",
"n",
"c",
"|",
"U",
"n",
"d",
"e",
"r",
"s",
"t",
"a",
"n",
"d",
"i",
"n",
"g"
],
"dv_metric_section" : "a",
"dv_m" : "amaze_inc|Understanding"
},
"_ingest" : {
"timestamp" : "2021-08-02T08:33:58.2234143Z"
}
}
}
]
}
如果我设置"separator": "\|"
,那么我会得到这个错误:
{
"docs" : [
{
"error" : {
"root_cause" : [
{
"type" : "general_script_exception",
"reason" : "Error running com.github.mustachejava.codes.DefaultMustache@776f8239"
}
],
"type" : "general_script_exception",
"reason" : "Error running com.github.mustachejava.codes.DefaultMustache@776f8239",
"caused_by" : {
"type" : "mustache_exception",
"reason" : "Failed to get value for dv_m_splited.2 @[query-template:1]",
"caused_by" : {
"type" : "mustache_exception",
"reason" : "2 @[query-template:1]",
"caused_by" : {
"type" : "index_out_of_bounds_exception",
"reason" : "2"
}
}
}
}
}
]
}
解决方案相当简单:只需转义分隔符即可。
作为拆分处理器is a regular expression中的separator
字段,需要对|
.[=28等特殊字符进行转义=]
你还需要转义两次
所以你的代码只缺少双重转义部分:
POST _ingest/pipeline/_simulate
{
"pipeline": {
"description": "String cutting processing",
"processors": [
{
"split": {
"field": "foo",
"separator": "\|"
}
}
]
},
"docs": [
{
"_source": {
"foo": "apple|time"
}
}
]
}
更新
你没有提到或者我错过了你想将值分配给两个单独字段的部分。
在这种情况下,您应该使用 dissect
而不是 split
。它更短、更简单、更干净。请参阅文档 here。
POST _ingest/pipeline/_simulate
{
"pipeline": {
"description": """combined fields are text that contain "|" to separate two fields""",
"processors": [
{
"dissect": {
"field": "dv_m",
"pattern": "%{dv_metric_prod}|%{dv_metric_section}"
}
}
]
},
"docs": [
{
"_source": {
"dv_m": "amaze_inc|Understanding"
}
}
]
}
结果
{
"docs" : [
{
"doc" : {
"_index" : "_index",
"_type" : "_doc",
"_id" : "_id",
"_source" : {
"dv_metric_prod" : "amaze_inc",
"dv_metric_section" : "Understanding",
"dv_m" : "amaze_inc|Understanding"
},
"_ingest" : {
"timestamp" : "2021-08-18T07:39:12.84910326Z"
}
}
}
]
}
附录
If using split
instead of dissect
你的数组索引有误。没有 {{dv_m_splited.2}}
这样的东西,因为数组索引从 0 开始,你只有两个结果。
这是使用 split
处理器时的正确管道。
POST _ingest/pipeline/_simulate
{
"pipeline": {
"description": """combined fields are text that contain "|" to separate two fields""",
"processors": [
{
"split": {
"field": "dv_m",
"separator": "\|",
"target_field": "dv_m_splited"
}
},
{
"set": {
"field": "dv_metric_prod",
"value": "{{dv_m_splited.0}}",
"override": false
}
},
{
"set": {
"field": "dv_metric_section",
"value": "{{dv_m_splited.1}}",
"override": false
}
}
]
},
"docs": [
{
"_source": {
"dv_m": "amaze_inc|Understanding"
}
}
]
}
我创建了一个摄取管道,如下所示,将一个字段拆分为单词:
POST _ingest/pipeline/_simulate
{
"pipeline": {
"description": "String cutting processing",
"processors": [
{
"split": {
"field": "foo",
"separator": "|"
}
}
]
},
"docs": [
{
"_source": {
"foo": "apple|time"
}
}
]
}
但它将字段拆分为字符:
{
"docs" : [
{
"doc" : {
"_index" : "_index",
"_type" : "_doc",
"_id" : "_id",
"_source" : {
"foo" : [
"a",
"p",
"p",
"l",
"e",
"|",
"t",
"i",
"m",
"e"
]
}
}
}
]
}
如果我用逗号替换分隔符,相同的管道将字段拆分为单词:
POST _ingest/pipeline/_simulate
{
"pipeline": {
"description": "String cutting processing",
"processors": [
{
"split": {
"field": "foo",
"separator": ","
}
}
]
},
"docs": [
{
"_source": {
"foo": "apple,time"
}
}
]
}
那么输出将是:
{
"docs" : [
{
"doc" : {
"_index" : "_index",
"_type" : "_doc",
"_id" : "_id",
"_source" : {
"foo" : [
"apple",
"time"
]
}
}
}
]
}
当分隔符为“|”时,如何将字段拆分为单词?
我的下一个问题是如何将此摄取管道应用于现有索引?
我试过
编辑
这是包含将两个部分分配给两列的文档的整个管道:
POST _ingest/pipeline/_simulate
{
"pipeline": {
"description": """combined fields are text that contain "|" to separate two fields""",
"processors": [
{
"split": {
"field": "dv_m",
"separator": "|",
"target_field": "dv_m_splited"
}
},
{
"set": {
"field": "dv_metric_prod",
"value": "{{dv_m_splited.1}}",
"override": false
}
},
{
"set": {
"field": "dv_metric_section",
"value": "{{dv_m_splited.2}}",
"override": false
}
}
]
},
"docs": [
{
"_source": {
"dv_m": "amaze_inc|Understanding"
}
}
]
}
生成此响应:
{
"docs" : [
{
"doc" : {
"_index" : "_index",
"_type" : "_doc",
"_id" : "_id",
"_source" : {
"dv_metric_prod" : "m",
"dv_m_splited" : [
"a",
"m",
"a",
"z",
"e",
"_",
"i",
"n",
"c",
"|",
"U",
"n",
"d",
"e",
"r",
"s",
"t",
"a",
"n",
"d",
"i",
"n",
"g"
],
"dv_metric_section" : "a",
"dv_m" : "amaze_inc|Understanding"
},
"_ingest" : {
"timestamp" : "2021-08-02T08:33:58.2234143Z"
}
}
}
]
}
如果我设置"separator": "\|"
,那么我会得到这个错误:
{
"docs" : [
{
"error" : {
"root_cause" : [
{
"type" : "general_script_exception",
"reason" : "Error running com.github.mustachejava.codes.DefaultMustache@776f8239"
}
],
"type" : "general_script_exception",
"reason" : "Error running com.github.mustachejava.codes.DefaultMustache@776f8239",
"caused_by" : {
"type" : "mustache_exception",
"reason" : "Failed to get value for dv_m_splited.2 @[query-template:1]",
"caused_by" : {
"type" : "mustache_exception",
"reason" : "2 @[query-template:1]",
"caused_by" : {
"type" : "index_out_of_bounds_exception",
"reason" : "2"
}
}
}
}
}
]
}
解决方案相当简单:只需转义分隔符即可。
作为拆分处理器is a regular expression中的separator
字段,需要对|
.[=28等特殊字符进行转义=]
你还需要转义两次
所以你的代码只缺少双重转义部分:
POST _ingest/pipeline/_simulate
{
"pipeline": {
"description": "String cutting processing",
"processors": [
{
"split": {
"field": "foo",
"separator": "\|"
}
}
]
},
"docs": [
{
"_source": {
"foo": "apple|time"
}
}
]
}
更新
你没有提到或者我错过了你想将值分配给两个单独字段的部分。
在这种情况下,您应该使用 dissect
而不是 split
。它更短、更简单、更干净。请参阅文档 here。
POST _ingest/pipeline/_simulate
{
"pipeline": {
"description": """combined fields are text that contain "|" to separate two fields""",
"processors": [
{
"dissect": {
"field": "dv_m",
"pattern": "%{dv_metric_prod}|%{dv_metric_section}"
}
}
]
},
"docs": [
{
"_source": {
"dv_m": "amaze_inc|Understanding"
}
}
]
}
结果
{
"docs" : [
{
"doc" : {
"_index" : "_index",
"_type" : "_doc",
"_id" : "_id",
"_source" : {
"dv_metric_prod" : "amaze_inc",
"dv_metric_section" : "Understanding",
"dv_m" : "amaze_inc|Understanding"
},
"_ingest" : {
"timestamp" : "2021-08-18T07:39:12.84910326Z"
}
}
}
]
}
附录
If using
split
instead ofdissect
你的数组索引有误。没有 {{dv_m_splited.2}}
这样的东西,因为数组索引从 0 开始,你只有两个结果。
这是使用 split
处理器时的正确管道。
POST _ingest/pipeline/_simulate
{
"pipeline": {
"description": """combined fields are text that contain "|" to separate two fields""",
"processors": [
{
"split": {
"field": "dv_m",
"separator": "\|",
"target_field": "dv_m_splited"
}
},
{
"set": {
"field": "dv_metric_prod",
"value": "{{dv_m_splited.0}}",
"override": false
}
},
{
"set": {
"field": "dv_metric_section",
"value": "{{dv_m_splited.1}}",
"override": false
}
}
]
},
"docs": [
{
"_source": {
"dv_m": "amaze_inc|Understanding"
}
}
]
}