如何通过在 Kibana 中摄取管道将字段拆分为单词

How split a field to words by ingest pipeline in Kibana

我创建了一个摄取管道,如下所示,将一个字段拆分为单词:

POST _ingest/pipeline/_simulate
{
    "pipeline": {
        "description": "String cutting processing",
        "processors": [
            {
                "split": {
                    "field": "foo",
                    "separator": "|"
                }
            }
        ]
    },
    "docs": [
        {
            "_source": {
                "foo": "apple|time"
            }
        }
    ]
}

但它将字段拆分为字符:

{
  "docs" : [
    {
      "doc" : {
        "_index" : "_index",
        "_type" : "_doc",
        "_id" : "_id",
        "_source" : {
          "foo" : [
            "a",
            "p",
            "p",
            "l",
            "e",
            "|",
            "t",
            "i",
            "m",
            "e"
          ]
        }
      }
    }
  ]
}

如果我用逗号替换分隔符,相同的管道将字段拆分为单词:

POST _ingest/pipeline/_simulate
{
    "pipeline": {
        "description": "String cutting processing",
        "processors": [
            {
                "split": {
                    "field": "foo",
                    "separator": ","
                }
            }
        ]
    },
    "docs": [
        {
            "_source": {
                "foo": "apple,time"
            }
        }
    ]
}

那么输出将是:

{
  "docs" : [
    {
      "doc" : {
        "_index" : "_index",
        "_type" : "_doc",
        "_id" : "_id",
        "_source" : {
          "foo" : [
            "apple",
            "time"
          ]
        }
      }
    }
  ]
}

当分隔符为“|”时,如何将字段拆分为单词? 我的下一个问题是如何将此摄取管道应用于现有索引? 我试过 ,但它对我不起作用。

编辑

这是包含将两个部分分配给两列的文档的整个管道:

POST _ingest/pipeline/_simulate
{
  "pipeline": {
    "description": """combined fields are text that contain  "|" to separate two fields""",
    "processors": [
      {
        "split": {
          "field": "dv_m",
          "separator": "|",
          "target_field": "dv_m_splited"
        }
      },
      {
        "set": {
          "field": "dv_metric_prod",
          "value": "{{dv_m_splited.1}}",
          "override": false
        }
      },
      {
        "set": {
          "field": "dv_metric_section",
          "value": "{{dv_m_splited.2}}",
          "override": false
        }
      }
    ]
  },
  "docs": [
    {

      "_source": {

        "dv_m": "amaze_inc|Understanding"

      }
    }
  ]
}

生成此响应:

{
  "docs" : [
    {
      "doc" : {
        "_index" : "_index",
        "_type" : "_doc",
        "_id" : "_id",
        "_source" : {
          "dv_metric_prod" : "m",
          "dv_m_splited" : [
            "a",
            "m",
            "a",
            "z",
            "e",
            "_",
            "i",
            "n",
            "c",
            "|",
            "U",
            "n",
            "d",
            "e",
            "r",
            "s",
            "t",
            "a",
            "n",
            "d",
            "i",
            "n",
            "g"
          ],
          "dv_metric_section" : "a",
          "dv_m" : "amaze_inc|Understanding"
        },
        "_ingest" : {
          "timestamp" : "2021-08-02T08:33:58.2234143Z"
        }
      }
    }
  ]
}

如果我设置"separator": "\|",那么我会得到这个错误:

{
  "docs" : [
    {
      "error" : {
        "root_cause" : [
          {
            "type" : "general_script_exception",
            "reason" : "Error running com.github.mustachejava.codes.DefaultMustache@776f8239"
          }
        ],
        "type" : "general_script_exception",
        "reason" : "Error running com.github.mustachejava.codes.DefaultMustache@776f8239",
        "caused_by" : {
          "type" : "mustache_exception",
          "reason" : "Failed to get value for dv_m_splited.2 @[query-template:1]",
          "caused_by" : {
            "type" : "mustache_exception",
            "reason" : "2 @[query-template:1]",
            "caused_by" : {
              "type" : "index_out_of_bounds_exception",
              "reason" : "2"
            }
          }
        }
      }
    }
  ]
}

解决方案相当简单:只需转义分隔符即可。

作为拆分处理器is a regular expression中的separator字段,需要对|.[=28等特殊字符进行转义=]

你还需要转义两次

所以你的代码只缺少双重转义部分:

POST _ingest/pipeline/_simulate

{
    "pipeline": {
        "description": "String cutting processing",
        "processors": [
            {
                "split": {
                    "field": "foo",
                    "separator": "\|"
                }
            }
        ]
    },
    "docs": [
        {
            "_source": {
                "foo": "apple|time"
            }
        }
    ]
}

更新

你没有提到或者我错过了你想将值分配给两个单独字段的部分。

在这种情况下,您应该使用 dissect 而不是 split。它更短、更简单、更干净。请参阅文档 here

POST _ingest/pipeline/_simulate

{
  "pipeline": {
    "description": """combined fields are text that contain  "|" to separate two fields""",
    "processors": [
      {
        "dissect": {
          "field": "dv_m",
          "pattern": "%{dv_metric_prod}|%{dv_metric_section}"
        }
      }
    ]
  },
  "docs": [
    {
      "_source": {
        "dv_m": "amaze_inc|Understanding"
      }
    }
  ]
}

结果

{
  "docs" : [
    {
      "doc" : {
        "_index" : "_index",
        "_type" : "_doc",
        "_id" : "_id",
        "_source" : {
          "dv_metric_prod" : "amaze_inc",
          "dv_metric_section" : "Understanding",
          "dv_m" : "amaze_inc|Understanding"
        },
        "_ingest" : {
          "timestamp" : "2021-08-18T07:39:12.84910326Z"
        }
      }
    }
  ]
}

附录

If using split instead of dissect

你的数组索引有误。没有 {{dv_m_splited.2}} 这样的东西,因为数组索引从 0 开始,你只有两个结果。

这是使用 split 处理器时的正确管道。

POST _ingest/pipeline/_simulate

{
  "pipeline": {
    "description": """combined fields are text that contain  "|" to separate two fields""",
    "processors": [
      {
        "split": {
          "field": "dv_m",
          "separator": "\|",
          "target_field": "dv_m_splited"
        }
      },
      {
        "set": {
          "field": "dv_metric_prod",
          "value": "{{dv_m_splited.0}}",
          "override": false
        }
      },
      {
        "set": {
          "field": "dv_metric_section",
          "value": "{{dv_m_splited.1}}",
          "override": false
        }
      }
    ]
  },
  "docs": [
    {
      "_source": {
        "dv_m": "amaze_inc|Understanding"
      }
    }
  ]
}