将 Logstash 与 HTML 日志一起使用

Use Logstash with HTML log

我是 Logstash 的新手,试图用它来解析 HTML 日志文件。 我只需要输出日志行,即忽略前面的 JS,CSS 和 HTML,它们也包含在文件中。 文件中的日志行如下所示:

<tr bgcolor="tomato"><td>Jan 28<br>13:52:25.692</td><td>Jan 28<br>13:52:23.950</td><td>qtp114615276-1648 [POST] [call_id:-8009072655119858507]</td><td>REST</td><td>sa</td><td>0.0.0.0</td><td>ERR</td><td>ProjectValidator.validate(36)</td><td>Project does not exist</td></tr>

我获取所有行没有问题,但我想要一个只包含相关行的输出,没有 HTML 标签,看起来像这样:

{
  "db_timestamp": "2015-01-28 13:52:25.692",
  "server_timestamp": "2015-01-28 13:52:25.950",
  "node": "qtp114615276-1648 [POST] [call_id:-8009072655119858507]",
  "thread": "REST",
  "user": "sa",
  "ip": "0.0.0.0",
  "level": "ERR",
  "method": "ProjectValidator.validate(36)",
  "message": "Project does not exist"
}

我的 Logstash 配置是:

input {
  file {
    type => "request"
    path => "<some path>/*.log"
    start_position => "beginning"
  }
  file {
    type => "log"
    path => "<some path>/*.html"
    start_position => "beginning"
  }
}
filter {
  if [type] == "log" {
    grok {
        match => [ WHAT SHOULD I PUT HERE??? ]  
    }
  }
}
output {
  stdout {}
  if [type] == "request" {
    http {
        http_method => "post"
        url => "http://<some url>"
        mapping =>  ["type", "request", "host" ,"%{host}", "timestamp", "%{@timestamp}", "message", "%{message}"]
    }
  }
  if [type] == "log" {
    http {
        http_method => "post"
        url => "http://<some url>"
        mapping =>  [ ALSO WHAT SHOULD I PUT HERE??? ]
    }
  }
}

有办法吗?到目前为止,我还没有找到任何相关的文档或示例。

谢谢!

终于找到答案了

不确定这是最好的还是最优雅的解决方案,但它确实有效。

我将 http 输出格式更改为 "message",这使我能够覆盖整个消息并将其格式化为 JSON,而不是使用映射。此外,了解如何在 grok 过滤器中命名参数并在输出中使用它们。

这是新的 Logstash 配置文件:

input {
  file {
    type => "request"
    path => "<some path>/*.log"
    start_position => "beginning"
  }
  file {
    type => "log"
    path => "<some path>/*.html"
    start_position => "beginning"
  }
}

filter {
  if [type] == "log" {
    grok {
            match => { "message" => "<tr bgcolor=.*><td>%{MONTH:db_date}%{SPACE}%{MONTHDAY:db_date}<br>%{TIME:db_date}</td><td>%{MONTH:alm_date}%{SPACE}%{MONTHDAY:alm_date}<br>%{TIME:alm_date}</td><td>%{DATA:thread}</td><td>%{DATA:req_type}</td><td>%{DATA:username}</td><td>%{IP:ip}</td><td>%{DATA:level}</td><td>%{DATA:method}</td><td>%{DATA:err_message}</td></tr>" }
    }
  }
}

output { stdout { codec => rubydebug }
  if [type] == "request" {
    http {
        http_method => "post"
        url => "http://<some URL>"
        mapping =>  ["type", "request", "host" ,"%{host}", "timestamp", "%{@timestamp}", "message", "%{message}"]
    }
  }
  if [type] == "log" {
    http {
        format => "message"
        content_type => "application/json"
        http_method => "post"
        url => "http://<some URL>"
        message=> '{
            "db_date":"%{db_date}", 
            "alm_date":"%{alm_date}", 
            "thread": "%{thread}", 
            "req_type": "%{req_type}", 
            "username": "%{username}", 
            "ip": "%{ip}",
            "level": "%{level}",
            "method": "%{method}",
            "message": "%{err_message}"         
        }'
    }
  }
}

注意 http 消息块的单引号和该块内参数的双引号。

对于解析 HP ALM 日志的任何人,以下 Logstash 过滤器将完成工作:

   grok {
        break_on_match => true
        match => [ "message", "<tr bgcolor=.*><td>%{MONTH:db_date_mon}%{SPACE}%{MONTHDAY:db_date_day}<br>%{TIME:db_date_time}<\/td><td>%{MONTH:alm_date_mon}%{SPACE}%{MONTHDAY:alm_date_day}<br>%{TIME:alm_date_time}<\/td><td>(?<thread_col1>.*?)<\/td><td>(?<request_type>.*?)<\/td><td>(?<login>.*?)<\/td><td>(?<ip>.*?)<\/td><td>(?<level>.*?)<\/td><td>(?<method>.*?)<\/td><td>(?m:(?<log_message>.*?))</td></tr>" ]
        }
    mutate {
        add_field => ["db_date", "%{db_date_mon} %{db_date_day}"] 
        add_field => ["alm_date", "%{alm_date_mon} %{alm_date_day}"]
        remove_field => [ "db_date_mon", "db_date_day", "alm_date_mon", "alm_date_day"  ]            
        gsub => [
           "log_message", "<br>", "
           "
           ]
        gsub => [
           "log_message", "<p>", "   "
           ]

        }

已通过 Logstash 2.4.0 测试并正常工作