在 logstash 的列表中解析 json
Parse json in a list in logstash
我有一个 json 形式为
[
{
"foo":"bar"
}
]
我正在尝试使用 logstash 中的 json 过滤器对其进行过滤。但这似乎不起作用。我发现我无法使用 logstash 中的 json 过滤器解析列表 json。有人可以告诉我任何解决方法吗?
更新
我的日志
IP - - 0.000 0.000 [24/May/2015:06:51:13 +0000] *"POST /c.gif HTTP/1.1"* 200 4 * user_id=UserID&package_name=SomePackageName&model=Titanium+S202&country_code=in&android_id=AndroidID&eT=1432450271859&eTz=GMT%2B05%3A30&events=%5B%7B%22eV%22%3A%22com.olx.southasia%22%2C%22eC%22%3A%22appUpdate%22%2C%22eA%22%3A%22app_activated%22%2C%22eTz%22%3A%22GMT%2B05%3A30%22%2C%22eT%22%3A%221432386324909%22%2C%22eL%22%3A%22packageName%22%7D%5D * "-" "-" "-"
URL 上面日志的解码版本是
IP - - 0.000 0.000 [24/May/2015:06:51:13 0000] *"POST /c.gif HTTP/1.1"* 200 4 * user_id=UserID&package_name=SomePackageName&model=Titanium S202&country_code=in&android_id=AndroidID&eT=1432450271859&eTz=GMT+05:30&events=[{"eV":"com.olx.southasia","eC":"appUpdate","eA":"app_activated","eTz":"GMT+05:30","eT":"1432386324909","eL":"packageName"}] * "-" "-" "-"
请在我的配置文件下方找到上述日志..
过滤器{
urldecode{
field => "message"
}
grok {
match => ["message",'%{IP:clientip}%{GREEDYDATA} \[%{GREEDYDATA:timestamp}\] \*"%{WORD:method}%{GREEDYDATA}']
}
kv {
field_split => "&? "
}
json{
source=> "events"
}
geoip {
source => "clientip"
}
}
我需要解析事件,即events=[{"eV":"com.olx.southasia","eC":"appUpdate","eA":"app_activated","eTz":"GMT+05:30","eT":"1432386324909","eL":"packageName"}]
我假设您的 json 在一个文件中。你是对的,你不能直接使用 json 过滤器。您必须使用多行编解码器,然后使用 json 过滤器。
以下配置适用于您给定的输入。但是,您可能必须更改它才能正确分隔事件。这取决于您的需要和文件的 json 格式。
Logstash 配置:
input {
file {
codec => multiline
{
pattern => "^\]" # Change to separate events
negate => true
what => previous
}
path => ["/absolute/path/to/your/json/file"]
start_position => "beginning"
sincedb_path => "/dev/null" # This is just for testing
}
}
filter {
mutate {
gsub => [ "message","\[",""]
gsub => [ "message","\n",""]
}
json { source => message }
}
更新
更新后我想我发现了问题。显然你得到一个 jsonparsefailure 因为方括号。作为解决方法,您可以手动删除它们。在您的 kv 之后和您的 json 过滤器之前添加以下 mutate 过滤器:
mutate {
gsub => [ "events","\]",""]
gsub => [ "events","\[",""]
}
更新 2
好的,假设您的输入如下所示:
[{"foo":"bar"},{"foo":"bar1"}]
这里有4个选项:
选项 a) 丑陋的 gsub
一个丑陋的解决方法是另一个 gsub:
gsub => [ "event","\},\{",","]
但这会删除内部关系,所以我猜你不想那样做。
选项 b) 拆分
更好的方法可能是使用拆分过滤器:
split {
field => "event"
terminator => ","
}
mutate {
gsub => [ "event","\]",""]
gsub => [ "event","\[",""]
}
json{
source=> "event"
}
这会产生多个事件。 (第一个是 foo = bar
,第二个是 foo1 = bar1
。)
选项 c) 变异拆分
您可能希望将所有值都放在一个 logstash 事件中。您可以使用 mutate => split 过滤器生成一个数组并在存在条目时解析 json。不幸的是,您必须为每个条目设置一个条件,因为 logstash 在其配置中不支持循环。
mutate {
gsub => [ "event","\]",""]
gsub => [ "event","\[",""]
split => [ "event", "," ]
}
json{
source=> "event[0]"
target => "result[0]"
}
if 'event[1]' {
json{
source=> "event[1]"
target => "result[1]"
}
if 'event[2]' {
json{
source=> "event[2]"
target => "result[2]"
}
}
# You would have to specify more conditionals if you expect even more dictionaries
}
选项 d) Ruby
根据您的评论,我试图找到一个 ruby 方法。以下作品(在您的 kv 过滤器之后):
mutate {
gsub => [ "event","\]",""]
gsub => [ "event","\[",""]
}
ruby {
init => "require 'json'"
code => "
e = event['event'].split(',')
ary = Array.new
e.each do |x|
hash = JSON.parse(x)
hash.each do |key, value|
ary.push( { key => value } )
end
end
event['result'] = ary
"
}
选项 e) Ruby
在你的 kv 过滤器之后使用这个方法(不设置 mutate 过滤器):
ruby {
init => "require 'json'"
code => "
event['result'] = JSON.parse(event['event'])
"
}
它将解析像 event=[{"name":"Alex","address":"NewYork"},{"name":"David","address":"NewJersey"}]
这样的事件
进入:
"result" => [
[0] {
"name" => "Alex",
"address" => "NewYork"
},
[1] {
"name" => "David",
"address" => "NewJersey"
}
由于 kv 过滤器的行为不支持空格。我希望你没有任何真实的投入,是吗?
我有一个 json 形式为
[
{
"foo":"bar"
}
]
我正在尝试使用 logstash 中的 json 过滤器对其进行过滤。但这似乎不起作用。我发现我无法使用 logstash 中的 json 过滤器解析列表 json。有人可以告诉我任何解决方法吗?
更新
我的日志
IP - - 0.000 0.000 [24/May/2015:06:51:13 +0000] *"POST /c.gif HTTP/1.1"* 200 4 * user_id=UserID&package_name=SomePackageName&model=Titanium+S202&country_code=in&android_id=AndroidID&eT=1432450271859&eTz=GMT%2B05%3A30&events=%5B%7B%22eV%22%3A%22com.olx.southasia%22%2C%22eC%22%3A%22appUpdate%22%2C%22eA%22%3A%22app_activated%22%2C%22eTz%22%3A%22GMT%2B05%3A30%22%2C%22eT%22%3A%221432386324909%22%2C%22eL%22%3A%22packageName%22%7D%5D * "-" "-" "-"
URL 上面日志的解码版本是
IP - - 0.000 0.000 [24/May/2015:06:51:13 0000] *"POST /c.gif HTTP/1.1"* 200 4 * user_id=UserID&package_name=SomePackageName&model=Titanium S202&country_code=in&android_id=AndroidID&eT=1432450271859&eTz=GMT+05:30&events=[{"eV":"com.olx.southasia","eC":"appUpdate","eA":"app_activated","eTz":"GMT+05:30","eT":"1432386324909","eL":"packageName"}] * "-" "-" "-"
请在我的配置文件下方找到上述日志..
过滤器{
urldecode{
field => "message"
}
grok {
match => ["message",'%{IP:clientip}%{GREEDYDATA} \[%{GREEDYDATA:timestamp}\] \*"%{WORD:method}%{GREEDYDATA}']
}
kv {
field_split => "&? "
}
json{
source=> "events"
}
geoip {
source => "clientip"
}
}
我需要解析事件,即events=[{"eV":"com.olx.southasia","eC":"appUpdate","eA":"app_activated","eTz":"GMT+05:30","eT":"1432386324909","eL":"packageName"}]
我假设您的 json 在一个文件中。你是对的,你不能直接使用 json 过滤器。您必须使用多行编解码器,然后使用 json 过滤器。
以下配置适用于您给定的输入。但是,您可能必须更改它才能正确分隔事件。这取决于您的需要和文件的 json 格式。
Logstash 配置:
input {
file {
codec => multiline
{
pattern => "^\]" # Change to separate events
negate => true
what => previous
}
path => ["/absolute/path/to/your/json/file"]
start_position => "beginning"
sincedb_path => "/dev/null" # This is just for testing
}
}
filter {
mutate {
gsub => [ "message","\[",""]
gsub => [ "message","\n",""]
}
json { source => message }
}
更新
更新后我想我发现了问题。显然你得到一个 jsonparsefailure 因为方括号。作为解决方法,您可以手动删除它们。在您的 kv 之后和您的 json 过滤器之前添加以下 mutate 过滤器:
mutate {
gsub => [ "events","\]",""]
gsub => [ "events","\[",""]
}
更新 2
好的,假设您的输入如下所示:
[{"foo":"bar"},{"foo":"bar1"}]
这里有4个选项:
选项 a) 丑陋的 gsub
一个丑陋的解决方法是另一个 gsub:
gsub => [ "event","\},\{",","]
但这会删除内部关系,所以我猜你不想那样做。
选项 b) 拆分
更好的方法可能是使用拆分过滤器:
split {
field => "event"
terminator => ","
}
mutate {
gsub => [ "event","\]",""]
gsub => [ "event","\[",""]
}
json{
source=> "event"
}
这会产生多个事件。 (第一个是 foo = bar
,第二个是 foo1 = bar1
。)
选项 c) 变异拆分
您可能希望将所有值都放在一个 logstash 事件中。您可以使用 mutate => split 过滤器生成一个数组并在存在条目时解析 json。不幸的是,您必须为每个条目设置一个条件,因为 logstash 在其配置中不支持循环。
mutate {
gsub => [ "event","\]",""]
gsub => [ "event","\[",""]
split => [ "event", "," ]
}
json{
source=> "event[0]"
target => "result[0]"
}
if 'event[1]' {
json{
source=> "event[1]"
target => "result[1]"
}
if 'event[2]' {
json{
source=> "event[2]"
target => "result[2]"
}
}
# You would have to specify more conditionals if you expect even more dictionaries
}
选项 d) Ruby
根据您的评论,我试图找到一个 ruby 方法。以下作品(在您的 kv 过滤器之后):
mutate {
gsub => [ "event","\]",""]
gsub => [ "event","\[",""]
}
ruby {
init => "require 'json'"
code => "
e = event['event'].split(',')
ary = Array.new
e.each do |x|
hash = JSON.parse(x)
hash.each do |key, value|
ary.push( { key => value } )
end
end
event['result'] = ary
"
}
选项 e) Ruby
在你的 kv 过滤器之后使用这个方法(不设置 mutate 过滤器):
ruby {
init => "require 'json'"
code => "
event['result'] = JSON.parse(event['event'])
"
}
它将解析像 event=[{"name":"Alex","address":"NewYork"},{"name":"David","address":"NewJersey"}]
进入:
"result" => [
[0] {
"name" => "Alex",
"address" => "NewYork"
},
[1] {
"name" => "David",
"address" => "NewJersey"
}
由于 kv 过滤器的行为不支持空格。我希望你没有任何真实的投入,是吗?