在 Logstash 中解析出尴尬的 JSON
Parsing out awkward JSON in Logstash
下午,
过去几周我一直在尝试对此进行排序,但找不到解决方案。我们通过第三部分收到一些日志,到目前为止,我已经使用 grok 将下面的值提取到详细信息字段中。令人恼火的是,如果不是所有的斜杠,这将非常简单。
是否有一种简单的方法可以在 Logstash 中将此数据解析为 JSON?
{\"CreationTime\":\"2021-05-11T06:42:44\",\"Id\":\"xxxxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxx\",\"Operation\":\"SearchMtpBatch\",\"OrganizationId\":\"xxxxxxxxx-xxx-xxxx-xxxx-xxxxxxx\",\"RecordType\":52,\"UserKey\":\"eample@example.onmicrosoft.com\",\"UserType\":5,\"Version\":1,\"Workload\":\"SecurityComplianceCenter\",\"UserId\":\"example@example.onmicrosoft.com\",\"AadAppId\":\"xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxx\",\"DataType\":\"MtpBatch\",\"DatabaseType\":\"DataInsights\",\"RelativeUrl\":\"/DataInsights/DataInsightsService.svc/Find/MtpBatch?tenantid=xxxxxxx-xxxxx-xxxx-xxx-xxxxxxxx&PageSize=200&Filter=ModelType+eq+1+and+ContainerUrn+eq+%xxurn%xAZappedUrlInvestigation%xxxxxxxxxxxxxxxxxxxxxx%xx\",\"ResultCount\":\"1\"}
您可以使用 json
filter 轻松实现此目的:
filter {
json {
source => "message"
}
}
如果您的源数据实际上包含这些反斜杠,那么您需要以某种方式删除它们,然后 Logstash 才能将消息识别为有效 JSON。
您可以在它到达 Logstash 之前执行此操作,然后 json 编解码器可能会按预期工作。或者如果你想让Logstash来处理,你可以使用Mutate的gsub
选项,然后用JSON过滤器来解析:
filter {
mutate {
gsub => ["message", "[\]", "" ]
}
json {
source => "message"
}
}
有几点需要注意:这只会盲目地删除所有反斜杠。如果您的字符串 可能 包含反斜杠,您需要做一些更复杂的事情。我之前在 gsub
中转义反斜杠时遇到问题,发现使用正则表达式 any of
/[]
构造更安全。
这是 docker 该配置的 运行 单行代码。在命令行中使用 -e
指定配置时,stdin 输入和 stdout 输出是默认的,因此为了便于阅读,我在这里省略了它们:
docker run --rm -it docker.elastic.co/logstash/logstash:7.12.1 -e 'filter { mutate { gsub => ["message", "[\]", "" ]} json { source => "message" } }'
将您的示例粘贴到 return 中会产生以下输出:
{
"@timestamp" => 2021-05-13T01:57:40.736Z,
"RelativeUrl" => "/DataInsights/DataInsightsService.svc/Find/MtpBatch?tenantid=xxxxxxx-xxxxx-xxxx-xxx-xxxxxxxx&PageSize=200&Filter=ModelType+eq+1+and+ContainerUrn+eq+%xxurn%xAZappedUrlInvestigation%xxxxxxxxxxxxxxxxxxxxxx%xx",
"OrganizationId" => "xxxxxxxxx-xxx-xxxx-xxxx-xxxxxxx",
"UserKey" => "eample@example.onmicrosoft.com",
"DataType" => "MtpBatch",
"message" => "{\"CreationTime\":\"2021-05-11T06:42:44\",\"Id\":\"xxxxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxx\",\"Operation\":\"SearchMtpBatch\",\"OrganizationId\":\"xxxxxxxxx-xxx-xxxx-xxxx-xxxxxxx\",\"RecordType\":52,\"UserKey\":\"eample@example.onmicrosoft.com\",\"UserType\":5,\"Version\":1,\"Workload\":\"SecurityComplianceCenter\",\"UserId\":\"example@example.onmicrosoft.com\",\"AadAppId\":\"xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxx\",\"DataType\":\"MtpBatch\",\"DatabaseType\":\"DataInsights\",\"RelativeUrl\":\"/DataInsights/DataInsightsService.svc/Find/MtpBatch?tenantid=xxxxxxx-xxxxx-xxxx-xxx-xxxxxxxx&PageSize=200&Filter=ModelType+eq+1+and+ContainerUrn+eq+%xxurn%xAZappedUrlInvestigation%xxxxxxxxxxxxxxxxxxxxxx%xx\",\"ResultCount\":\"1\"}",
"UserType" => 5,
"UserId" => "example@example.onmicrosoft.com",
"type" => "stdin",
"host" => "de2c988c09c7",
"@version" => "1",
"Operation" => "SearchMtpBatch",
"AadAppId" => "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxx",
"ResultCount" => "1",
"DatabaseType" => "DataInsights",
"Version" => 1,
"RecordType" => 52,
"CreationTime" => "2021-05-11T06:42:44",
"Id" => "xxxxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxx",
"Workload" => "SecurityComplianceCenter"
}
下午,
过去几周我一直在尝试对此进行排序,但找不到解决方案。我们通过第三部分收到一些日志,到目前为止,我已经使用 grok 将下面的值提取到详细信息字段中。令人恼火的是,如果不是所有的斜杠,这将非常简单。
是否有一种简单的方法可以在 Logstash 中将此数据解析为 JSON?
{\"CreationTime\":\"2021-05-11T06:42:44\",\"Id\":\"xxxxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxx\",\"Operation\":\"SearchMtpBatch\",\"OrganizationId\":\"xxxxxxxxx-xxx-xxxx-xxxx-xxxxxxx\",\"RecordType\":52,\"UserKey\":\"eample@example.onmicrosoft.com\",\"UserType\":5,\"Version\":1,\"Workload\":\"SecurityComplianceCenter\",\"UserId\":\"example@example.onmicrosoft.com\",\"AadAppId\":\"xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxx\",\"DataType\":\"MtpBatch\",\"DatabaseType\":\"DataInsights\",\"RelativeUrl\":\"/DataInsights/DataInsightsService.svc/Find/MtpBatch?tenantid=xxxxxxx-xxxxx-xxxx-xxx-xxxxxxxx&PageSize=200&Filter=ModelType+eq+1+and+ContainerUrn+eq+%xxurn%xAZappedUrlInvestigation%xxxxxxxxxxxxxxxxxxxxxx%xx\",\"ResultCount\":\"1\"}
您可以使用 json
filter 轻松实现此目的:
filter {
json {
source => "message"
}
}
如果您的源数据实际上包含这些反斜杠,那么您需要以某种方式删除它们,然后 Logstash 才能将消息识别为有效 JSON。
您可以在它到达 Logstash 之前执行此操作,然后 json 编解码器可能会按预期工作。或者如果你想让Logstash来处理,你可以使用Mutate的gsub
选项,然后用JSON过滤器来解析:
filter {
mutate {
gsub => ["message", "[\]", "" ]
}
json {
source => "message"
}
}
有几点需要注意:这只会盲目地删除所有反斜杠。如果您的字符串 可能 包含反斜杠,您需要做一些更复杂的事情。我之前在 gsub
中转义反斜杠时遇到问题,发现使用正则表达式 any of
/[]
构造更安全。
这是 docker 该配置的 运行 单行代码。在命令行中使用 -e
指定配置时,stdin 输入和 stdout 输出是默认的,因此为了便于阅读,我在这里省略了它们:
docker run --rm -it docker.elastic.co/logstash/logstash:7.12.1 -e 'filter { mutate { gsub => ["message", "[\]", "" ]} json { source => "message" } }'
将您的示例粘贴到 return 中会产生以下输出:
{
"@timestamp" => 2021-05-13T01:57:40.736Z,
"RelativeUrl" => "/DataInsights/DataInsightsService.svc/Find/MtpBatch?tenantid=xxxxxxx-xxxxx-xxxx-xxx-xxxxxxxx&PageSize=200&Filter=ModelType+eq+1+and+ContainerUrn+eq+%xxurn%xAZappedUrlInvestigation%xxxxxxxxxxxxxxxxxxxxxx%xx",
"OrganizationId" => "xxxxxxxxx-xxx-xxxx-xxxx-xxxxxxx",
"UserKey" => "eample@example.onmicrosoft.com",
"DataType" => "MtpBatch",
"message" => "{\"CreationTime\":\"2021-05-11T06:42:44\",\"Id\":\"xxxxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxx\",\"Operation\":\"SearchMtpBatch\",\"OrganizationId\":\"xxxxxxxxx-xxx-xxxx-xxxx-xxxxxxx\",\"RecordType\":52,\"UserKey\":\"eample@example.onmicrosoft.com\",\"UserType\":5,\"Version\":1,\"Workload\":\"SecurityComplianceCenter\",\"UserId\":\"example@example.onmicrosoft.com\",\"AadAppId\":\"xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxx\",\"DataType\":\"MtpBatch\",\"DatabaseType\":\"DataInsights\",\"RelativeUrl\":\"/DataInsights/DataInsightsService.svc/Find/MtpBatch?tenantid=xxxxxxx-xxxxx-xxxx-xxx-xxxxxxxx&PageSize=200&Filter=ModelType+eq+1+and+ContainerUrn+eq+%xxurn%xAZappedUrlInvestigation%xxxxxxxxxxxxxxxxxxxxxx%xx\",\"ResultCount\":\"1\"}",
"UserType" => 5,
"UserId" => "example@example.onmicrosoft.com",
"type" => "stdin",
"host" => "de2c988c09c7",
"@version" => "1",
"Operation" => "SearchMtpBatch",
"AadAppId" => "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxx",
"ResultCount" => "1",
"DatabaseType" => "DataInsights",
"Version" => 1,
"RecordType" => 52,
"CreationTime" => "2021-05-11T06:42:44",
"Id" => "xxxxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxx",
"Workload" => "SecurityComplianceCenter"
}