将 MongoDB 中的数据附加到日志文件中，由 Logstash 处理并解析到 Elasticsearch 中

Question

对不起标题，我的案子真的不是一句话就能解释清楚的

这是我的情况：

我有一大组日志文件（大约 4GB），我希望使用 Logstash 对其进行解析以用于 Elastic stack（Logstash、Elasticsearch、Kibana）。
在日志中，有一个序列号是我用Logstash解析成功的。这个数字对应一个MongoDBcollection的索引。在解析每个日志时，我希望能够使用解析后的数字查询 collection 并检索我想要包含在传递给 Elasticsearch 的最终输出中的数据。

为了使事情更清楚，这里有一个粗略的例子。假设我有原始日志：

2017-11-20 14:24:14.011 123 log_number_one

在解析日志发送到 Elasticsearch 之前，我想用 123 查询我的 MongoDB collection，并获取数据 data1 和 data2附加到要发送到 Elasticsearch 的文档，因此我的最终结果将具有类似于以下内容的字段：

{ 
    timestamp: 2017-11-20 14:24:14.011, 
    serial: 123, 
    data1: "foo", 
    data2: "bar", 
    log: log_number_one
}

我认为，实现此目的的更简单方法是简单地预处理日志和运行通过 MongoDB 的数字，然后再通过 Logstash 解析它们。但是，看到我好像有 4GB 的日志文件，我希望有一种方法可以一举实现这一目标。我想知道我的边缘情况是否可以用 ruby 过滤器插件解决，在那里我可以运行一些任意的 ruby 代码来完成上述操作？

任何帮助/建议将不胜感激！

Answer 1

来自 Elastic 团队成员 Christian_Dahlqvist 的回答（全部归功于他）：

Depending on the number of records and total size of the data in MongoDB (assuming it is a reasonable size data set), you may be able to extract the data into a file where each serial number is associated with a string representation of the data in JSON form. You could then use the translate filter to populate a field with the serialised JSON based on the serial number and then use a son filter to parse this and add it to the event.

参考：https://discuss.elastic.co/t/appending-data-from-mongodb-into-log-files-being-processed-by-logstash-and-parsed-into-elasticsearch/92564/2

将 MongoDB 中的数据附加到日志文件中，由 Logstash 处理并解析到 Elasticsearch 中

Appending data from MongoDB into log files being processed by Logstash and parsed into Elasticsearch

mongodb

logstash

elastic-stack