SumoLogic — 绘制日志中 "status" json 消息的数据

SumoLogic — Plotting data from a "status" json message in the log

我有一个接受和处理任务的服务。任务具有状态:排队、运行、失败、取消或完成。服务偶尔会吐出一个带有 json 的日志条目,如下所示:

2021-09-09 00:30:46,742 [Timer-0] INFO - { "env": "test_environment", "capacity": 10, "available_ec2": 10, "failed_ec2": 0, "running_tasks": 0, "queued_tasks": 0, "finished_tasks": 0, "failed_tasks": 0, "cancelled_tasks": 3,"queue_wait_minutes" : { "max": 0, "mean": -318990, "max_started": 0, "mean_started": -29715 },"processing_time": {"max": 0, "mean": 0} }

我想绘制一个饼图,按状态显示任务的细分 ("running_tasks", "queued_tasks", "finished_tasks", "failed_tasks":, json 消息中的"cancelled_tasks")。到目前为止我还没有这样做,因为我想不出如何从这样的消息中构造一个 table 。任何线索将不胜感激 — 提前致谢!

首先,相扑逻辑supports parsing JSON into fields。在您的示例中,不是整行都是 JSON,而是“-”之后的部分,因此您可以将其添加到查询中:

...
| parse "INFO - *" as jsonMessage
| json auto

那么,可以把running_tasksqueued_tasks等作为普通字段使用,例如

...
| timeslice 1m
| max(running_tasks), max(queued_tasks) by _timeslice

免责声明:我目前受雇于 Sumo Logic。

下面是一个纯粹的 python 解决方案,您可以绘制数据。

输出 (entries) 是一个字典,其中键是时间戳,值是包含有趣信息的字典。 log_lines 保存日志消息的集合并用作输入。

import json
import pprint

log_lines = [
    '2021-09-09 00:30:46,742 [Timer-0] INFO - { "env": "test_environment", "capacity": 10, "available_ec2": 10, "failed_ec2": 0, "running_tasks": 2, "queued_tasks": 0, "finished_tasks": 0, "failed_tasks": 0, "cancelled_tasks": 3,"queue_wait_minutes" : { "max": 0, "mean": -318990, "max_started": 0, "mean_started": -29715 },"processing_time": {"max": 0, "mean": 0} }',
    '2021-09-09 00:31:46,742 [Timer-0] INFO - { "env": "test_environment", "capacity": 10, "available_ec2": 10, "failed_ec2": 0, "running_tasks": 5, "queued_tasks": 0, "finished_tasks": 0, "failed_tasks": 0, "cancelled_tasks": 3,"queue_wait_minutes" : { "max": 0, "mean": -318990, "max_started": 0, "mean_started": -29715 },"processing_time": {"max": 0, "mean": 0} }'
]
entries = dict()

for line in log_lines:
    date = line[:line.find('[') - 1]
    data = json.loads(line[line.find('{'):])
    sub_set = {k: data.get(k,0) for k in
               ["running_tasks", "queued_tasks", "finished_tasks", "failed_tasks", "cancelled_tasks"]}
    entries[date] = sub_set
pprint.pprint(entries)

输出

{'2021-09-09 00:30:46,742': {'cancelled_tasks': 3,
                             'failed_tasks': 0,
                             'finished_tasks': 0,
                             'queued_tasks': 0,
                             'running_tasks': 2},
 '2021-09-09 00:31:46,742': {'cancelled_tasks': 3,
                             'failed_tasks': 0,
                             'finished_tasks': 0,
                             'queued_tasks': 0,
                             'running_tasks': 5}}

尝试这样的事情。基本上,您必须反转置数据。我希望这是有道理的!

...
| parse field=some_log_line "INFO - *" as jsonMessage
| json field=jsonMessage "running_tasks"
| json field=jsonMessage "queued_tasks"
| json field=jsonMessage "finished_tasks"
| "running_tasks,queued_tasks,finished_tasks," as message_keys
| parse regex field=message_keys "(?<message_key>.*?)," multi
| if (message_key="running_tasks", running_tasks, 0) as message_value
| if (message_key="queued_tasks", queued_tasks, message_value) as message_value
| if (message_key="finished_tasks", finished_tasks, message_value) as message_value
| fields message_key, message_value
| max(message_value) by message_key