将客户端唯一记录添加到日志事件，流利的一面。例如，使用过滤器

Question

我成功地将 docker 化的流利 TCP 日志记录到运行！含义：有远程 python 容器使用稍作修改 logging.handlers.SocketHandler 发送一些 JSON 到 fluentd - 和它实际上到达那里，看起来像这样：

2020-08-31T09:06:31+00:00 paws.tcp {"service_uuid":"paws_log","loglvl":"INFO","file":"paws_log.paws_log","line":59,"msg":"Ping log line #2"}

我有多个这样的 python 容器，希望 fluentd 添加一些每个日志事件的源 ID 类型。 Reading the docs 让我给了 filter -> record 机制一个机会。通过 新添加导致以下配置片段 filter块:

<source>
  @type tcp
  @label stream_paws
  @id paws_tcp
  tag paws.tcp
  port 5170
  bind 0.0.0.0
  # https://docs.fluentd.org/parser/regexp
  <parse>
    @type regexp
    expression /^(?<service_uuid>[a-zA-Z0-9_-]+): (?<logtime>[^\s]+) (?<loglvl>[^\s]+) \[(?<file>[^\]:]+):(?<line>\d+)\]: (?<msg>.*)$/
    time_key logtime
    time_format %H:%M:%S
    types line:integer
  </parse>
</source>

# Add meta data fluentd side.
# https://docs.fluentd.org/deployment/logging
<filter **> # << Does NOT seem to work if kept outside the label-block! Inside is fine.
  @type record_transformer
  <record>
    host "#{Socket.gethostname}"
  </record>
</filter>

<label stream_paws>
  <match paws.tcp>
    @type file
    @id output_paws_tcp
    path /fluentd/log/paws/data/tcp.*.log
    symlink_path /fluentd/log/paws/tcp.log
  </match>
</label>

我这里有两个问题：

如果我将 filter-block 放在 label-block 中，以上配置有效。但我不想这样做，因为我希望过滤器在全球范围内发挥作用。 @include 指令可能会在此处提供解决方法。还有更好的吗？
我怀疑 "#{Socket.gethostname}" 在 fluentd server 上产生了信息。但是，我想要 客户端 上的一些东西。理想情况下包括一些在 docker 容器级别上唯一的 id（可能是容器 id。但是，任何旧的客户端唯一 uuid 都可以）。你知道这样的属性可以用 fluentd 访问吗？

Answer 1

如果您使用的是 fluentd docker 日志记录驱动程序，它已经将容器元数据（包括 id）添加到每个日志记录中： https://docs.docker.com/config/containers/logging/fluentd/

Above config works if I put the filter-block inside the label-block. But this I do not want to do because I want the filter to act globally. @include directives might offer a work-around here. Anything better?

通常在服务器上实现的全局过滤器，例如：

<source>
...
</source>

<filter **> # filter globally
...
</filter>

<match tag.one>
...
</match>

<match tag.two>
...
</match>

<match **> # the rest
...
</match>

I suspect "#{Socket.gethostname}" yields information on the fluentd server.

正确，参见：https://docs.fluentd.org/filter/record_transformer#example-configurations。当您还想跟踪哪个服务器处理了日志记录时，这会很有用。

Answer 2

如果您使用的是 kubernetes，则使用 kubernetes 元数据，它将在每个日志条目中添加 pod 详细信息。

<filter kubernetes.**>
  @id filter_kubernetes_metadata
  @type kubernetes_metadata
</filter>

对于Docker

我以前并没有真正使用过 fluentd，所以对于这里稍微抽象的回答表示歉意。但是.. 检查 http://docs.fluentd.org/ 我猜你可能正在使用 in_tail 作为日志？从那里的示例来看，您可能希望将文件路径获取到输入消息中：

path /path/to/file
tag foo.*

显然用 foo.path.to.file

标记事件

您可能可以将 http://docs.fluentd.org/articles/filter_record_transformer 与 enable_ruby 一起使用。由此看来，您可能可以处理 foo.path.to.file 标记并使用一点 ruby 来提取容器 ID，然后解析出 JSON 文件。

例如，使用以下 ruby 文件进行测试，例如 foo.rb

tag = 'foo.var.lib.docker.containers.ID.ID-json.log'
require 'json'; id = tag.split('.')[5]; puts JSON.parse(IO.read("/var/lib/docker/containers/#{id}/config.v2.json"))['image']

其中 config.v2.json 类似于：

{"image":"foo"}

会打印你

foo

Fluentd 可能已经为您包含了 json，所以也许您可以省略 require 'json'；少量。然后，用流利的术语来说，也许你可以使用像

这样的东西

<filter>
  enable_ruby
  <record>
    container ${tag.split('.')[5]}
    image ${id = tag.split('.')[5]; JSON.parse(IO.read("/var/lib/docker/containers/#{id}/config.v2.json"))['image']}
  </record>
 </filter>

在你的情况下，你可以像下面这样使用

<filter raw.**>
  @type record_transformer
  enable_ruby
 <record>
    container ${id = tag.split('.')[5]; JSON.parse(IO.read("/var/lib/docker/containers/#{id}/config.v2.json"))["Name"][1..-1]}
     hostname "#{Socket.gethostname}"
 </record>
</filter>

将客户端唯一记录添加到日志事件，流利的一面。例如，使用过滤器

Adding client-unique record to a log event, fluentd side. E.g., using filter

fluentd

docker