如何使用 logstash 从 log4j 文件中过滤 JSON 数据?
How to filter JSON data from a log4j file using logstash?
我有一个如下所示的日志文件。
2014-12-24 09:41:29,383 INFO c.t.t.a.c.LoginController.getCSRFToken [http-bio-8080-exec-4] in getCSRFToken
2014-12-24 09:41:29,383 DEBUG c.t.t.a.c.LoginController.getCSRFToken [http-bio-8080-exec-4] CSRFToken set successfully.
2014-12-24 09:44:26,607 INFO c.t.t.a.c.LoginController.getCSRFToken [http-bio-8080-exec-8] in getCSRFToken
2014-12-24 09:44:26,609 DEBUG c.t.t.a.c.LoginController.getCSRFToken [http-bio-8080-exec-8] CSRFToken set successfully.
2014-12-26 09:55:28,399 INFO c.t.t.a.c.LoginController.getCSRFToken [http-bio-8080-exec-9] in getCSRFToken
2014-12-26 09:55:28,401 DEBUG c.t.t.a.c.LoginController.getCSRFToken [http-bio-8080-exec-9] CSRFToken set successfully.
2014-12-26 11:10:32,135 INFO c.t.t.a.c.LoginController.getCSRFToken [http-bio-8080-exec-10] in getCSRFToken
2014-12-26 11:10:32,136 DEBUG c.t.t.a.c.LoginController.getCSRFToken [http-bio-8080-exec-10] CSRFToken set successfully.
2014-12-26 11:12:40,500 INFO c.t.t.a.c.LoginController.getCSRFToken [http-bio-8080-exec-7] in getCSRFToken
2014-12-26 11:12:40,501 DEBUG c.t.t.a.c.LoginController.getCSRFToken [http-bio-8080-exec-7] CSRFToken set successfully.
2015-11-30 16:21:09,145 INFO c.t.t.s.a.i.AnalyticsServiceImpl.captureHit [http-bio-8080-exec-9] EnquiryDetails : {"createdTime":1448880669029,"modifiedTime":null,"active":true,"deleted":false,"deletedOn":-1,"guid":null,"uuid":null,"id":130771,"instanceId":130665,"pos":"","channel":"Web","flightNo":"TWBL2DL2","orig":"BLR","dest":"DEL","cabCls":"ECONOMY","logCls":"Y","noOfPaxs":1,"scheduleEntryId":130661,"travelDateTime":[2015,12,1,21,30],"enquiryDateTime":[2015,11,30,16,21,9,23000000]}
您会注意到最后一行包含一些 JSON 数据
我正在尝试配置我的 logstash 以提取此 JSON 数据
以下是我的 logstash 配置文件:
input {
file {
path => "C:/Users/TESTER/Desktop/files/test1.log"
type => "test"
start_position => "beginning"
}
}
filter {
grok {
match => [ "message" , "timestamp : %{DATESTAMP:timestamp}", "severity: %{WORD:severity}", "clazz: %{JAVACLASS:clazz}", "selco: %{NOTSPACE:selco}", "testerField: (?<ENQDTLS>EnquiryDetails :)"]
}
}
output {
elasticsearch {
hosts => "localhost"
index => "test1"
}
stdout {}
}
然而这是我的 logstash 输出:
C:\logstash-2.0.0\bin>logstash -f test1.conf
io/console not supported; tty will not be manipulated
Default settings used: Filter workers: 2
Logstash startup completed
2016-01-08T08:02:02.029Z TW 2014-12-24 09:41:29,383 INFO c.t.t.a.c.LoginController.getCSRFToken [http-bio-8080-exec-4] in getCSRFToken
2016-01-08T08:02:02.029Z TW 2014-12-24 09:44:26,607 INFO c.t.t.a.c.LoginController.getCSRFToken [http-bio-8080-exec-8] in getCSRFToken
2016-01-08T08:02:02.029Z TW 2014-12-24 09:44:26,609 DEBUG c.t.t.a.c.LoginController.getCSRFToken [http-bio-8080-exec-8] CSRFToken set successfully.
2016-01-08T08:02:02.029Z TW 2014-12-26 09:55:28,399 INFO c.t.t.a.c.LoginController.getCSRFToken [http-bio-8080-exec-9] in getCSRFToken
2016-01-08T08:02:02.029Z TW 2014-12-26 09:55:28,401 DEBUG c.t.t.a.c.LoginController.getCSRFToken [http-bio-8080-exec-9] CSRFToken set successfully.
2016-01-08T08:02:02.029Z TW 2014-12-26 11:10:32,135 INFO c.t.t.a.c.LoginController.getCSRFToken [http-bio-8080-exec-10] in getCSRFToken
2016-01-08T08:02:02.029Z TW 2014-12-26 11:10:32,136 DEBUG c.t.t.a.c.LoginController.getCSRFToken [http-bio-8080-exec-10] CSRFToken set successfully.
2016-01-08T08:02:02.029Z TW 2014-12-24 09:41:29,383 DEBUG c.t.t.a.c.LoginController.getCSRFToken [http-bio-8080-exec-4] CSRFToken set successfully.
2016-01-08T08:02:02.029Z TW 2014-12-26 11:12:40,500 INFO c.t.t.a.c.LoginController.getCSRFToken [http-bio-8080-exec-7] in getCSRFToken
2016-01-08T08:02:02.029Z TW 2015-11-30 16:21:09,145 INFO c.t.t.s.a.i.AnalyticsServiceImpl.captureHit [http-bio-8080-exec-9] EnquiryDetails : {"createdTime":1448880669029,"modifiedTime":null,"active":true,"deleted":false,"deletedOn":-1,"guid":null,"uuid":null,"id":130771,"instanceId":130665,"pos":"","channel":"Web","flightNo":"TWBL2DL2","orig":"BLR","dest":"DEL","cabCls":"ECONOMY","logCls":"Y","noOfPaxs":1,"scheduleEntryId":130661,"travelDateTime":[2015,12,1,21,30],"enquiryDateTime":[2015,11,30,16,21,9,23000000]}
2016-01-08T08:02:02.029Z TW 2014-12-26 11:12:40,501 DEBUG c.t.t.a.c.LoginController.getCSRFToken [http-bio-8080-exec-7] CSRFToken set successfully.
有人能告诉我我做错了什么吗?谢谢
您没有说您遇到的是 "wrong",但我们假设您担心输出中缺少字段。
首先,在您的 stdout{} 输出节中使用 rubydebug 或 json 编解码器。它将向您显示更多详细信息。
其次,您的 grok{} 似乎一团糟。 grok{} 接受一个输入字段和一个或多个正则表达式以应用于输入。你给它输入 ("message"),但是这个正则表达式:
"timestamp : %{DATESTAMP:timestamp}"
与您的输入不匹配,因为您没有文字字符串 "timestamp : "。
你需要更多类似的东西:
"%{DATESTAMP} %{WORD:severity}" (etc)
我建议设置一个 grok{} 节来提取所有公共信息(直到 ])。然后,使用另一个来处理不同类型的消息。
我找到了解决问题的方法。
input {
file {
path => "C:/Users/TESTER/Desktop/elk Files 8-1-2015/test1.log"
start_position => "beginning"
}
}
filter {
grok {
match => {"message" => "%{DATESTAMP:timestamp} %{WORD:severity} %{JAVACLASS:clazz} %{NOTSPACE:selco} (?<ENQDTLS>EnquiryDetails :) (?<JSONDATA>.*)"}
add_tag => [ "ENQDTLS"]
}
if "ENQDTLS" not in [tags] {
drop { }
}
mutate {
remove_tag => ["ENQDTLS"]
}
json {
source => "JSONDATA"
}
mutate {
remove_field => ["timestamp"]
remove_field => ["clazz"]
remove_field => ["selco"]
remove_field => ["severity"]
remove_field => ["ENQDTLS"]
remove_field => ["JSONDATA"]
}
}
output {
elasticsearch {
hosts => "localhost"
index => "test3"
}
stdout {
codec => rubydebug
}
}
所以我在这里做的是使用 GROK 过滤掉任何不包含关键字 "EnquiryDetails" 的行,然后我处理该行中的 JSON 数据。
我希望这可以帮助其他可能遇到相同问题的人。
也因为我是新来的。想知道这是否是一个好方法。
我有一个如下所示的日志文件。
2014-12-24 09:41:29,383 INFO c.t.t.a.c.LoginController.getCSRFToken [http-bio-8080-exec-4] in getCSRFToken
2014-12-24 09:41:29,383 DEBUG c.t.t.a.c.LoginController.getCSRFToken [http-bio-8080-exec-4] CSRFToken set successfully.
2014-12-24 09:44:26,607 INFO c.t.t.a.c.LoginController.getCSRFToken [http-bio-8080-exec-8] in getCSRFToken
2014-12-24 09:44:26,609 DEBUG c.t.t.a.c.LoginController.getCSRFToken [http-bio-8080-exec-8] CSRFToken set successfully.
2014-12-26 09:55:28,399 INFO c.t.t.a.c.LoginController.getCSRFToken [http-bio-8080-exec-9] in getCSRFToken
2014-12-26 09:55:28,401 DEBUG c.t.t.a.c.LoginController.getCSRFToken [http-bio-8080-exec-9] CSRFToken set successfully.
2014-12-26 11:10:32,135 INFO c.t.t.a.c.LoginController.getCSRFToken [http-bio-8080-exec-10] in getCSRFToken
2014-12-26 11:10:32,136 DEBUG c.t.t.a.c.LoginController.getCSRFToken [http-bio-8080-exec-10] CSRFToken set successfully.
2014-12-26 11:12:40,500 INFO c.t.t.a.c.LoginController.getCSRFToken [http-bio-8080-exec-7] in getCSRFToken
2014-12-26 11:12:40,501 DEBUG c.t.t.a.c.LoginController.getCSRFToken [http-bio-8080-exec-7] CSRFToken set successfully.
2015-11-30 16:21:09,145 INFO c.t.t.s.a.i.AnalyticsServiceImpl.captureHit [http-bio-8080-exec-9] EnquiryDetails : {"createdTime":1448880669029,"modifiedTime":null,"active":true,"deleted":false,"deletedOn":-1,"guid":null,"uuid":null,"id":130771,"instanceId":130665,"pos":"","channel":"Web","flightNo":"TWBL2DL2","orig":"BLR","dest":"DEL","cabCls":"ECONOMY","logCls":"Y","noOfPaxs":1,"scheduleEntryId":130661,"travelDateTime":[2015,12,1,21,30],"enquiryDateTime":[2015,11,30,16,21,9,23000000]}
您会注意到最后一行包含一些 JSON 数据 我正在尝试配置我的 logstash 以提取此 JSON 数据 以下是我的 logstash 配置文件:
input {
file {
path => "C:/Users/TESTER/Desktop/files/test1.log"
type => "test"
start_position => "beginning"
}
}
filter {
grok {
match => [ "message" , "timestamp : %{DATESTAMP:timestamp}", "severity: %{WORD:severity}", "clazz: %{JAVACLASS:clazz}", "selco: %{NOTSPACE:selco}", "testerField: (?<ENQDTLS>EnquiryDetails :)"]
}
}
output {
elasticsearch {
hosts => "localhost"
index => "test1"
}
stdout {}
}
然而这是我的 logstash 输出:
C:\logstash-2.0.0\bin>logstash -f test1.conf
io/console not supported; tty will not be manipulated
Default settings used: Filter workers: 2
Logstash startup completed
2016-01-08T08:02:02.029Z TW 2014-12-24 09:41:29,383 INFO c.t.t.a.c.LoginController.getCSRFToken [http-bio-8080-exec-4] in getCSRFToken
2016-01-08T08:02:02.029Z TW 2014-12-24 09:44:26,607 INFO c.t.t.a.c.LoginController.getCSRFToken [http-bio-8080-exec-8] in getCSRFToken
2016-01-08T08:02:02.029Z TW 2014-12-24 09:44:26,609 DEBUG c.t.t.a.c.LoginController.getCSRFToken [http-bio-8080-exec-8] CSRFToken set successfully.
2016-01-08T08:02:02.029Z TW 2014-12-26 09:55:28,399 INFO c.t.t.a.c.LoginController.getCSRFToken [http-bio-8080-exec-9] in getCSRFToken
2016-01-08T08:02:02.029Z TW 2014-12-26 09:55:28,401 DEBUG c.t.t.a.c.LoginController.getCSRFToken [http-bio-8080-exec-9] CSRFToken set successfully.
2016-01-08T08:02:02.029Z TW 2014-12-26 11:10:32,135 INFO c.t.t.a.c.LoginController.getCSRFToken [http-bio-8080-exec-10] in getCSRFToken
2016-01-08T08:02:02.029Z TW 2014-12-26 11:10:32,136 DEBUG c.t.t.a.c.LoginController.getCSRFToken [http-bio-8080-exec-10] CSRFToken set successfully.
2016-01-08T08:02:02.029Z TW 2014-12-24 09:41:29,383 DEBUG c.t.t.a.c.LoginController.getCSRFToken [http-bio-8080-exec-4] CSRFToken set successfully.
2016-01-08T08:02:02.029Z TW 2014-12-26 11:12:40,500 INFO c.t.t.a.c.LoginController.getCSRFToken [http-bio-8080-exec-7] in getCSRFToken
2016-01-08T08:02:02.029Z TW 2015-11-30 16:21:09,145 INFO c.t.t.s.a.i.AnalyticsServiceImpl.captureHit [http-bio-8080-exec-9] EnquiryDetails : {"createdTime":1448880669029,"modifiedTime":null,"active":true,"deleted":false,"deletedOn":-1,"guid":null,"uuid":null,"id":130771,"instanceId":130665,"pos":"","channel":"Web","flightNo":"TWBL2DL2","orig":"BLR","dest":"DEL","cabCls":"ECONOMY","logCls":"Y","noOfPaxs":1,"scheduleEntryId":130661,"travelDateTime":[2015,12,1,21,30],"enquiryDateTime":[2015,11,30,16,21,9,23000000]}
2016-01-08T08:02:02.029Z TW 2014-12-26 11:12:40,501 DEBUG c.t.t.a.c.LoginController.getCSRFToken [http-bio-8080-exec-7] CSRFToken set successfully.
有人能告诉我我做错了什么吗?谢谢
您没有说您遇到的是 "wrong",但我们假设您担心输出中缺少字段。
首先,在您的 stdout{} 输出节中使用 rubydebug 或 json 编解码器。它将向您显示更多详细信息。
其次,您的 grok{} 似乎一团糟。 grok{} 接受一个输入字段和一个或多个正则表达式以应用于输入。你给它输入 ("message"),但是这个正则表达式:
"timestamp : %{DATESTAMP:timestamp}"
与您的输入不匹配,因为您没有文字字符串 "timestamp : "。
你需要更多类似的东西:
"%{DATESTAMP} %{WORD:severity}" (etc)
我建议设置一个 grok{} 节来提取所有公共信息(直到 ])。然后,使用另一个来处理不同类型的消息。
我找到了解决问题的方法。
input {
file {
path => "C:/Users/TESTER/Desktop/elk Files 8-1-2015/test1.log"
start_position => "beginning"
}
}
filter {
grok {
match => {"message" => "%{DATESTAMP:timestamp} %{WORD:severity} %{JAVACLASS:clazz} %{NOTSPACE:selco} (?<ENQDTLS>EnquiryDetails :) (?<JSONDATA>.*)"}
add_tag => [ "ENQDTLS"]
}
if "ENQDTLS" not in [tags] {
drop { }
}
mutate {
remove_tag => ["ENQDTLS"]
}
json {
source => "JSONDATA"
}
mutate {
remove_field => ["timestamp"]
remove_field => ["clazz"]
remove_field => ["selco"]
remove_field => ["severity"]
remove_field => ["ENQDTLS"]
remove_field => ["JSONDATA"]
}
}
output {
elasticsearch {
hosts => "localhost"
index => "test3"
}
stdout {
codec => rubydebug
}
}
所以我在这里做的是使用 GROK 过滤掉任何不包含关键字 "EnquiryDetails" 的行,然后我处理该行中的 JSON 数据。 我希望这可以帮助其他可能遇到相同问题的人。 也因为我是新来的。想知道这是否是一个好方法。