Elasticsearch Filebeat 忽略自定义索引模板并使用默认 filebeat 索引模板覆盖输出索引的映射

Question

What are you trying to do?

使用 Filebeat 从 JSON files in ndjson format 中获取输入数据作为 filestream 并将它们插入 Elasticsearch 中的 my_index 没有额外的键。

Show me your configs.

Elasticsearch.yml

# ---------------------------------- Cluster -----------------------------------
#
cluster.name: masterCluster
#
# ------------------------------------ Node ------------------------------------
#
node.name: masterNode
#
#----------------------- BEGIN SECURITY AUTO CONFIGURATION -----------------------

# Security features
xpack.security.enabled: false
xpack.security.enrollment.enabled: false

xpack.security.http.ssl.enabled: false
xpack.security.transport.ssl.enabled: false

#----------------------- END SECURITY AUTO CONFIGURATION -------------------------

Filebeat.yml

# ============================== Filebeat inputs ===============================

filebeat.inputs:

- type: filestream

  enabled: true

  paths:
    - /home/asura/EBK/data/*.json

  parser:
    - ndjson:
        keys_under_root: true
        add_error_key: true

# ======================= Elasticsearch template setting =======================

setup.ilm.enabled: false

setup.template:
  name: "my_index_template"
  pattern: "my_index*"

# ---------------------------- Elasticsearch Output ----------------------------
output.elasticsearch:

  hosts: ["localhost:9200"]
  index: "my_index"

What do my_index and my_index_template look like?

Kibana 中 my_index 的映射：

{
  "mappings": {}
}

Kibana 中 my_index_template 的预览：

{
  "template": {
    "settings": {
      "index": {
        "routing": {
          "allocation": {
            "include": {
              "_tier_preference": "data_content"
            }
          }
        }
      }
    },
    "aliases": {},
    "mappings": {}
  }
}

What does your input file look like?

input.json

{"filename" :"16.avi", "frame": 131, "Class":"person", "confidence":32, "Date & Time" :"Thu Oct 3 14:02:41 2019", "Others" :"Blue"}
{"filename" :"16.avi", "frame": 131, "Class":"person", "confidence":36, "Date & Time" :"Thu Oct 3 14:02:41 2019", "Others" :"Grey,Blue"}

我将上面的文件拖放到监视文件夹中，插入就可以了。

What does the data look like after inserting into Elasticsearch?

GET 请求：http://<host>:<my_port>/my_index/_search?filter_path=hits.hits._source

回复：

{
  "hits": {
    "hits": [
      {
        "_source": {
          "@timestamp": "2022-04-21T21:49:04.084Z",
          "log": {
            "offset": 0,
            "file": {
              "path": "/home/asura/EBK/data/input.json"
            }
          },
          "frame": 131,
          "Class": "person",
          "input": {
            "type": "filestream"
          },
          "ecs": {
            "version": "8.0.0"
          },
          "host": {
            "name": "pisacha"
          },
          "agent": {
            "ephemeral_id": "d389a35d-40f7-4680-a485-8e6939d011ab",
            "id": "c6cb1ce5-ff92-499d-9e3c-e79478795fca",
            "name": "pisacha",
            "type": "filebeat",
            "version": "8.1.3"
          },
          "Date & Time": "Thu Oct 3 14:02:41 2019",
          "Others": "Blue",
          "filename": "16.avi",
          "confidence": 32
        }
      },
      {
        "_source": {
          "@timestamp": "2022-04-21T21:49:04.084Z",
          "agent": {
            "type": "filebeat",
            "version": "8.1.3",
            "ephemeral_id": "d389a35d-40f7-4680-a485-8e6939d011ab",
            "id": "c6cb1ce5-ff92-499d-9e3c-e79478795fca",
            "name": "pisacha"
          },
          "Others": "Grey,Blue",
          "filename": "16.avi",
          "input": {
            "type": "filestream"
          },
          "frame": 131,
          "Class": "person",
          "ecs": {
            "version": "8.0.0"
          },
          "host": {
            "name": "pisacha"
          },
          "confidence": 36,
          "log": {
            "offset": 133,
            "file": {
              "path": "/home/asura/EBK/data/input.json"
            }
          },
          "Date & Time": "Thu Oct 3 14:02:41 2019"
        }
      },
      {
        "_source": {
          "@timestamp": "2022-04-21T21:49:04.084Z",
          "input": {
            "type": "filestream"
          },
          "agent": {
            "id": "c6cb1ce5-ff92-499d-9e3c-e79478795fca",
            "name": "pisacha",
            "type": "filebeat",
            "version": "8.1.3",
            "ephemeral_id": "d389a35d-40f7-4680-a485-8e6939d011ab"
          },
          "ecs": {
            "version": "8.0.0"
          },
          "host": {
            "name": "pisacha"
          },
          "message": "",
          "error": {
            "type": "json",
            "message": "Error decoding JSON: EOF"
          }
        }
      }
    ]
  }
}

它没有使用我指定的模板。

而且令人惊讶的是：

Filebeat 插入数据后 my_index 在 Kibana 中的预览：

{
  "mappings": {
    "properties": {
      "@timestamp": {
        "type": "date"
      },
      "Class": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword",
            "ignore_above": 256
          }
        }
      },
      "Date & Time": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword",
            "ignore_above": 256
          }
        }
      },
      "Others": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword",
            "ignore_above": 256
          }
        }
      },
      "agent": {
        "properties": {
          "ephemeral_id": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          },
          "id": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          },
          "name": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          },
          "type": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          },
          "version": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          }
        }
      },
      "confidence": {
        "type": "long"
      },
      "ecs": {
        "properties": {
          "version": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          }
        }
      },
      "error": {
        "properties": {
          "message": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          },
          "type": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          }
        }
      },
      "filename": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword",
            "ignore_above": 256
          }
        }
      },
      "frame": {
        "type": "long"
      },
      "host": {
        "properties": {
          "name": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          }
        }
      },
      "input": {
        "properties": {
          "type": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          }
        }
      },
      "log": {
        "properties": {
          "file": {
            "properties": {
              "path": {
                "type": "text",
                "fields": {
                  "keyword": {
                    "type": "keyword",
                    "ignore_above": 256
                  }
                }
              }
            }
          },
          "offset": {
            "type": "long"
          }
        }
      },
      "message": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword",
            "ignore_above": 256
          }
        }
      }
    }
  }
}

my_index_template中的映射非常庞大，长达数万行。几乎就像它拥有 fields.yml 拥有的所有字段一样。它还默认为它创建了一个名为 my_index 的 data_stream。

即使在设置 setup.ilm.enabled: false 之后，数据仍然会被插入，所有字段都显示在 filebeat 默认索引模板中。我已经搜索并尝试了所有可能的方法，我需要一些不是在黑暗中射击的人的指导。

用于 Elasticsearch、Kibana 和 Filebeat 的版本：8.1.3 如果您需要更多信息，请发表评论:)

参考文献：

正在解析 ndjson：https://www.elastic.co/guide/en/beats/filebeat/current/filebeat-input-filestream.html#_parsers
使用自定义索引：https://www.elastic.co/guide/en/beats/filebeat/current/elasticsearch-output.html#index-option-es
使用自定义模板：https://www.elastic.co/guide/en/beats/filebeat/current/configuration-template.html
对于过滤后的响应：https://www.elastic.co/guide/en/elasticsearch/reference/current/common-options.html#common-options-response-filtering

Answer 1

TLDR；

我不确定是否有停止 Filebeat 添加这些字段的选项。

但您可以在输出中添加 filter processor 以删除它们。

# ============================== Filebeat inputs ===============================

filebeat.inputs:

- type: filestream

  enabled: true

  paths:
    - /home/asura/EBK/data/*.json

  parser:
    - ndjson:
        keys_under_root: true
        add_error_key: true

# ======================= Elasticsearch template setting =======================

setup.ilm.enabled: false

setup.template:
  name: "my_index_template"
  pattern: "my_index*"

# ---------------------------- Elasticsearch Output ----------------------------
output.elasticsearch:

  hosts: ["localhost:9200"]
  index: "my_index"
  processors:
  - drop_fields:
      fields: ["agent", "ecs", "host", ...]

如果存在完全禁用 Beats 以首先添加一些字段的选项，那将是更好的选择。我只是不知道。

编辑：

完整的工作解决方案涉及 Globally Declared Processors。

filebeat.inputs:
- type: filestream

  # Input Processors act during input stage of processing pipeline
  processors:
  - drop_fields:
      fields: ["key1","key2"]

# ---------------------------- Global Processors ------------------
# Global processors for fields that are added later by filebeat
processors:
- drop_fields:
    fields: ["agent", "ecs", "input", "log", "host"]

参考：

https://discuss.elastic.co/t/filebeat-didnt-drop-some-of-the-fields-like-agent-ecs-etc/243911/2

Elasticsearch Filebeat 忽略自定义索引模板并使用默认 filebeat 索引模板覆盖输出索引的映射

Elasticsearch Filebeat ignores custom index template and overwrites output index's mapping with default filebeat index template

indexing

templates

json

elasticsearch

filebeat

TLDR；

编辑：