如何使用 logstash 从 log4j 文件中过滤 JSON 数据?

How to filter JSON data from a log4j file using logstash?

我有一个如下所示的日志文件。

2014-12-24 09:41:29,383 INFO c.t.t.a.c.LoginController.getCSRFToken [http-bio-8080-exec-4] in getCSRFToken
2014-12-24 09:41:29,383 DEBUG c.t.t.a.c.LoginController.getCSRFToken [http-bio-8080-exec-4] CSRFToken set successfully.
2014-12-24 09:44:26,607 INFO c.t.t.a.c.LoginController.getCSRFToken [http-bio-8080-exec-8] in getCSRFToken
2014-12-24 09:44:26,609 DEBUG c.t.t.a.c.LoginController.getCSRFToken [http-bio-8080-exec-8] CSRFToken set successfully.
2014-12-26 09:55:28,399 INFO c.t.t.a.c.LoginController.getCSRFToken [http-bio-8080-exec-9] in getCSRFToken
2014-12-26 09:55:28,401 DEBUG c.t.t.a.c.LoginController.getCSRFToken [http-bio-8080-exec-9] CSRFToken set successfully.
2014-12-26 11:10:32,135 INFO c.t.t.a.c.LoginController.getCSRFToken [http-bio-8080-exec-10] in getCSRFToken
2014-12-26 11:10:32,136 DEBUG c.t.t.a.c.LoginController.getCSRFToken [http-bio-8080-exec-10] CSRFToken set successfully.
2014-12-26 11:12:40,500 INFO c.t.t.a.c.LoginController.getCSRFToken [http-bio-8080-exec-7] in getCSRFToken
2014-12-26 11:12:40,501 DEBUG c.t.t.a.c.LoginController.getCSRFToken [http-bio-8080-exec-7] CSRFToken set successfully.
2015-11-30 16:21:09,145 INFO c.t.t.s.a.i.AnalyticsServiceImpl.captureHit [http-bio-8080-exec-9] EnquiryDetails : {"createdTime":1448880669029,"modifiedTime":null,"active":true,"deleted":false,"deletedOn":-1,"guid":null,"uuid":null,"id":130771,"instanceId":130665,"pos":"","channel":"Web","flightNo":"TWBL2DL2","orig":"BLR","dest":"DEL","cabCls":"ECONOMY","logCls":"Y","noOfPaxs":1,"scheduleEntryId":130661,"travelDateTime":[2015,12,1,21,30],"enquiryDateTime":[2015,11,30,16,21,9,23000000]}

您会注意到最后一行包含一些 JSON 数据 我正在尝试配置我的 logstash 以提取此 JSON 数据 以下是我的 logstash 配置文件:

input {  
  file {
    path => "C:/Users/TESTER/Desktop/files/test1.log" 
    type => "test"
        start_position => "beginning" 
  }
}


filter {  
  grok {
    match => [ "message" , "timestamp : %{DATESTAMP:timestamp}", "severity: %{WORD:severity}", "clazz: %{JAVACLASS:clazz}", "selco: %{NOTSPACE:selco}", "testerField: (?<ENQDTLS>EnquiryDetails :)"]

       }
}


output {
    elasticsearch {
        hosts => "localhost"
        index => "test1"
    }
    stdout {}
}

然而这是我的 logstash 输出:

C:\logstash-2.0.0\bin>logstash -f test1.conf
io/console not supported; tty will not be manipulated
Default settings used: Filter workers: 2
Logstash startup completed
2016-01-08T08:02:02.029Z TW 2014-12-24 09:41:29,383 INFO c.t.t.a.c.LoginController.getCSRFToken [http-bio-8080-exec-4] in getCSRFToken
2016-01-08T08:02:02.029Z TW 2014-12-24 09:44:26,607 INFO c.t.t.a.c.LoginController.getCSRFToken [http-bio-8080-exec-8] in getCSRFToken
2016-01-08T08:02:02.029Z TW 2014-12-24 09:44:26,609 DEBUG c.t.t.a.c.LoginController.getCSRFToken [http-bio-8080-exec-8] CSRFToken set successfully.
2016-01-08T08:02:02.029Z TW 2014-12-26 09:55:28,399 INFO c.t.t.a.c.LoginController.getCSRFToken [http-bio-8080-exec-9] in getCSRFToken
2016-01-08T08:02:02.029Z TW 2014-12-26 09:55:28,401 DEBUG c.t.t.a.c.LoginController.getCSRFToken [http-bio-8080-exec-9] CSRFToken set successfully.
2016-01-08T08:02:02.029Z TW 2014-12-26 11:10:32,135 INFO c.t.t.a.c.LoginController.getCSRFToken [http-bio-8080-exec-10] in getCSRFToken
2016-01-08T08:02:02.029Z TW 2014-12-26 11:10:32,136 DEBUG c.t.t.a.c.LoginController.getCSRFToken [http-bio-8080-exec-10] CSRFToken set successfully.
2016-01-08T08:02:02.029Z TW 2014-12-24 09:41:29,383 DEBUG c.t.t.a.c.LoginController.getCSRFToken [http-bio-8080-exec-4] CSRFToken set successfully.
2016-01-08T08:02:02.029Z TW 2014-12-26 11:12:40,500 INFO c.t.t.a.c.LoginController.getCSRFToken [http-bio-8080-exec-7] in getCSRFToken
2016-01-08T08:02:02.029Z TW 2015-11-30 16:21:09,145 INFO c.t.t.s.a.i.AnalyticsServiceImpl.captureHit [http-bio-8080-exec-9] EnquiryDetails : {"createdTime":1448880669029,"modifiedTime":null,"active":true,"deleted":false,"deletedOn":-1,"guid":null,"uuid":null,"id":130771,"instanceId":130665,"pos":"","channel":"Web","flightNo":"TWBL2DL2","orig":"BLR","dest":"DEL","cabCls":"ECONOMY","logCls":"Y","noOfPaxs":1,"scheduleEntryId":130661,"travelDateTime":[2015,12,1,21,30],"enquiryDateTime":[2015,11,30,16,21,9,23000000]}
2016-01-08T08:02:02.029Z TW 2014-12-26 11:12:40,501 DEBUG c.t.t.a.c.LoginController.getCSRFToken [http-bio-8080-exec-7] CSRFToken set successfully.

有人能告诉我我做错了什么吗?谢谢

您没有说您遇到的是 "wrong",但我们假设您担心输出中缺少字段。

首先,在您的 stdout{} 输出节中使用 ruby​​debug 或 json 编解码器。它将向您显示更多详细信息。

其次,您的 grok{} 似乎一团糟。 grok{} 接受一个输入字段和一个或多个正则表达式以应用于输入。你给它输入 ("message"),但是这个正则表达式:

 "timestamp : %{DATESTAMP:timestamp}"

与您的输入不匹配,因为您没有文字字符串 "timestamp : "。

你需要更多类似的东西:

 "%{DATESTAMP} %{WORD:severity}" (etc)

我建议设置一个 grok{} 节来提取所有公共信息(直到 ])。然后,使用另一个来处理不同类型的消息。

我找到了解决问题的方法。

input {  
  file {
    path => "C:/Users/TESTER/Desktop/elk Files 8-1-2015/test1.log" 
        start_position => "beginning" 
  }
}


filter {  
  grok {

     match => {"message" => "%{DATESTAMP:timestamp} %{WORD:severity} %{JAVACLASS:clazz} %{NOTSPACE:selco} (?<ENQDTLS>EnquiryDetails :) (?<JSONDATA>.*)"}

     add_tag => [ "ENQDTLS"]


 }

  if "ENQDTLS" not in [tags] {            
    drop { }
  }

  mutate {
    remove_tag => ["ENQDTLS"]
  }

  json {
        source => "JSONDATA"
    }

    mutate {
    remove_field => ["timestamp"]
    remove_field => ["clazz"]
    remove_field => ["selco"]
    remove_field => ["severity"]
    remove_field => ["ENQDTLS"]
    remove_field => ["JSONDATA"]
  }

}


output {
    elasticsearch {
        hosts => "localhost"
        index => "test3"
    }
    stdout {
    codec => rubydebug
    }
}

所以我在这里做的是使用 GROK 过滤掉任何不包含关键字 "EnquiryDetails" 的行,然后我处理该行中的 JSON 数据。 我希望这可以帮助其他可能遇到相同问题的人。 也因为我是新来的。想知道这是否是一个好方法。