将 Logstash 与 HTML 日志一起使用
Use Logstash with HTML log
我是 Logstash 的新手,试图用它来解析 HTML 日志文件。
我只需要输出日志行,即忽略前面的 JS,CSS 和 HTML,它们也包含在文件中。
文件中的日志行如下所示:
<tr bgcolor="tomato"><td>Jan 28<br>13:52:25.692</td><td>Jan 28<br>13:52:23.950</td><td>qtp114615276-1648 [POST] [call_id:-8009072655119858507]</td><td>REST</td><td>sa</td><td>0.0.0.0</td><td>ERR</td><td>ProjectValidator.validate(36)</td><td>Project does not exist</td></tr>
我获取所有行没有问题,但我想要一个只包含相关行的输出,没有 HTML 标签,看起来像这样:
{
"db_timestamp": "2015-01-28 13:52:25.692",
"server_timestamp": "2015-01-28 13:52:25.950",
"node": "qtp114615276-1648 [POST] [call_id:-8009072655119858507]",
"thread": "REST",
"user": "sa",
"ip": "0.0.0.0",
"level": "ERR",
"method": "ProjectValidator.validate(36)",
"message": "Project does not exist"
}
我的 Logstash 配置是:
input {
file {
type => "request"
path => "<some path>/*.log"
start_position => "beginning"
}
file {
type => "log"
path => "<some path>/*.html"
start_position => "beginning"
}
}
filter {
if [type] == "log" {
grok {
match => [ WHAT SHOULD I PUT HERE??? ]
}
}
}
output {
stdout {}
if [type] == "request" {
http {
http_method => "post"
url => "http://<some url>"
mapping => ["type", "request", "host" ,"%{host}", "timestamp", "%{@timestamp}", "message", "%{message}"]
}
}
if [type] == "log" {
http {
http_method => "post"
url => "http://<some url>"
mapping => [ ALSO WHAT SHOULD I PUT HERE??? ]
}
}
}
有办法吗?到目前为止,我还没有找到任何相关的文档或示例。
谢谢!
终于找到答案了
不确定这是最好的还是最优雅的解决方案,但它确实有效。
我将 http 输出格式更改为 "message",这使我能够覆盖整个消息并将其格式化为 JSON,而不是使用映射。此外,了解如何在 grok 过滤器中命名参数并在输出中使用它们。
这是新的 Logstash 配置文件:
input {
file {
type => "request"
path => "<some path>/*.log"
start_position => "beginning"
}
file {
type => "log"
path => "<some path>/*.html"
start_position => "beginning"
}
}
filter {
if [type] == "log" {
grok {
match => { "message" => "<tr bgcolor=.*><td>%{MONTH:db_date}%{SPACE}%{MONTHDAY:db_date}<br>%{TIME:db_date}</td><td>%{MONTH:alm_date}%{SPACE}%{MONTHDAY:alm_date}<br>%{TIME:alm_date}</td><td>%{DATA:thread}</td><td>%{DATA:req_type}</td><td>%{DATA:username}</td><td>%{IP:ip}</td><td>%{DATA:level}</td><td>%{DATA:method}</td><td>%{DATA:err_message}</td></tr>" }
}
}
}
output { stdout { codec => rubydebug }
if [type] == "request" {
http {
http_method => "post"
url => "http://<some URL>"
mapping => ["type", "request", "host" ,"%{host}", "timestamp", "%{@timestamp}", "message", "%{message}"]
}
}
if [type] == "log" {
http {
format => "message"
content_type => "application/json"
http_method => "post"
url => "http://<some URL>"
message=> '{
"db_date":"%{db_date}",
"alm_date":"%{alm_date}",
"thread": "%{thread}",
"req_type": "%{req_type}",
"username": "%{username}",
"ip": "%{ip}",
"level": "%{level}",
"method": "%{method}",
"message": "%{err_message}"
}'
}
}
}
注意 http 消息块的单引号和该块内参数的双引号。
对于解析 HP ALM 日志的任何人,以下 Logstash 过滤器将完成工作:
grok {
break_on_match => true
match => [ "message", "<tr bgcolor=.*><td>%{MONTH:db_date_mon}%{SPACE}%{MONTHDAY:db_date_day}<br>%{TIME:db_date_time}<\/td><td>%{MONTH:alm_date_mon}%{SPACE}%{MONTHDAY:alm_date_day}<br>%{TIME:alm_date_time}<\/td><td>(?<thread_col1>.*?)<\/td><td>(?<request_type>.*?)<\/td><td>(?<login>.*?)<\/td><td>(?<ip>.*?)<\/td><td>(?<level>.*?)<\/td><td>(?<method>.*?)<\/td><td>(?m:(?<log_message>.*?))</td></tr>" ]
}
mutate {
add_field => ["db_date", "%{db_date_mon} %{db_date_day}"]
add_field => ["alm_date", "%{alm_date_mon} %{alm_date_day}"]
remove_field => [ "db_date_mon", "db_date_day", "alm_date_mon", "alm_date_day" ]
gsub => [
"log_message", "<br>", "
"
]
gsub => [
"log_message", "<p>", " "
]
}
已通过 Logstash 2.4.0 测试并正常工作
我是 Logstash 的新手,试图用它来解析 HTML 日志文件。 我只需要输出日志行,即忽略前面的 JS,CSS 和 HTML,它们也包含在文件中。 文件中的日志行如下所示:
<tr bgcolor="tomato"><td>Jan 28<br>13:52:25.692</td><td>Jan 28<br>13:52:23.950</td><td>qtp114615276-1648 [POST] [call_id:-8009072655119858507]</td><td>REST</td><td>sa</td><td>0.0.0.0</td><td>ERR</td><td>ProjectValidator.validate(36)</td><td>Project does not exist</td></tr>
我获取所有行没有问题,但我想要一个只包含相关行的输出,没有 HTML 标签,看起来像这样:
{
"db_timestamp": "2015-01-28 13:52:25.692",
"server_timestamp": "2015-01-28 13:52:25.950",
"node": "qtp114615276-1648 [POST] [call_id:-8009072655119858507]",
"thread": "REST",
"user": "sa",
"ip": "0.0.0.0",
"level": "ERR",
"method": "ProjectValidator.validate(36)",
"message": "Project does not exist"
}
我的 Logstash 配置是:
input {
file {
type => "request"
path => "<some path>/*.log"
start_position => "beginning"
}
file {
type => "log"
path => "<some path>/*.html"
start_position => "beginning"
}
}
filter {
if [type] == "log" {
grok {
match => [ WHAT SHOULD I PUT HERE??? ]
}
}
}
output {
stdout {}
if [type] == "request" {
http {
http_method => "post"
url => "http://<some url>"
mapping => ["type", "request", "host" ,"%{host}", "timestamp", "%{@timestamp}", "message", "%{message}"]
}
}
if [type] == "log" {
http {
http_method => "post"
url => "http://<some url>"
mapping => [ ALSO WHAT SHOULD I PUT HERE??? ]
}
}
}
有办法吗?到目前为止,我还没有找到任何相关的文档或示例。
谢谢!
终于找到答案了
不确定这是最好的还是最优雅的解决方案,但它确实有效。
我将 http 输出格式更改为 "message",这使我能够覆盖整个消息并将其格式化为 JSON,而不是使用映射。此外,了解如何在 grok 过滤器中命名参数并在输出中使用它们。
这是新的 Logstash 配置文件:
input {
file {
type => "request"
path => "<some path>/*.log"
start_position => "beginning"
}
file {
type => "log"
path => "<some path>/*.html"
start_position => "beginning"
}
}
filter {
if [type] == "log" {
grok {
match => { "message" => "<tr bgcolor=.*><td>%{MONTH:db_date}%{SPACE}%{MONTHDAY:db_date}<br>%{TIME:db_date}</td><td>%{MONTH:alm_date}%{SPACE}%{MONTHDAY:alm_date}<br>%{TIME:alm_date}</td><td>%{DATA:thread}</td><td>%{DATA:req_type}</td><td>%{DATA:username}</td><td>%{IP:ip}</td><td>%{DATA:level}</td><td>%{DATA:method}</td><td>%{DATA:err_message}</td></tr>" }
}
}
}
output { stdout { codec => rubydebug }
if [type] == "request" {
http {
http_method => "post"
url => "http://<some URL>"
mapping => ["type", "request", "host" ,"%{host}", "timestamp", "%{@timestamp}", "message", "%{message}"]
}
}
if [type] == "log" {
http {
format => "message"
content_type => "application/json"
http_method => "post"
url => "http://<some URL>"
message=> '{
"db_date":"%{db_date}",
"alm_date":"%{alm_date}",
"thread": "%{thread}",
"req_type": "%{req_type}",
"username": "%{username}",
"ip": "%{ip}",
"level": "%{level}",
"method": "%{method}",
"message": "%{err_message}"
}'
}
}
}
注意 http 消息块的单引号和该块内参数的双引号。
对于解析 HP ALM 日志的任何人,以下 Logstash 过滤器将完成工作:
grok {
break_on_match => true
match => [ "message", "<tr bgcolor=.*><td>%{MONTH:db_date_mon}%{SPACE}%{MONTHDAY:db_date_day}<br>%{TIME:db_date_time}<\/td><td>%{MONTH:alm_date_mon}%{SPACE}%{MONTHDAY:alm_date_day}<br>%{TIME:alm_date_time}<\/td><td>(?<thread_col1>.*?)<\/td><td>(?<request_type>.*?)<\/td><td>(?<login>.*?)<\/td><td>(?<ip>.*?)<\/td><td>(?<level>.*?)<\/td><td>(?<method>.*?)<\/td><td>(?m:(?<log_message>.*?))</td></tr>" ]
}
mutate {
add_field => ["db_date", "%{db_date_mon} %{db_date_day}"]
add_field => ["alm_date", "%{alm_date_mon} %{alm_date_day}"]
remove_field => [ "db_date_mon", "db_date_day", "alm_date_mon", "alm_date_day" ]
gsub => [
"log_message", "<br>", "
"
]
gsub => [
"log_message", "<p>", " "
]
}
已通过 Logstash 2.4.0 测试并正常工作