Logstash 多种日志格式

Logstash Multiple Log Formats

因此,我们正在寻找某种日志聚合器,因为将日志遍布各处是无法扩展的。我一直在查看 Logstash,昨晚能够使用 kibana 和 运行 获得一个实例,但存在一些问题。例如,geoip 将我们的域名用于 httpd(我假设它们是 apache)日志。

无论如何,现在我想将它打开到我们的其他 Web 服务器日志,但我无法理解一些东西:我是否有必要为我们正在使用的所有不同格式的日志定义模式?这通常是如何处理的:大 logstash.conf 文件,还是其他方式?

PS:我意识到其中一些日志有相似之处,例如 error_log 文件都具有几乎相同的格式,access_logs 也是如此。所以我假设这样的事情会处理所有 *error_log 文件。

input { 
    file {
        path => "//var/log/httpd/*error_log"
        type => "error_log"
    }
}

filter {
    if [type] == "error_log" {
        grok {
            match => [ "message", "%{COMBINEDAPACHELOG}" ]
        }
    }
}

无论如何,这是我要导入的每个日志的示例行。

var/log/httpd/access_log:
207.46.13.87 support.mycompany.com - - [15/Mar/2015:07:49:12 -0400] "GET / HTTP/1.1" 302 - "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"

var/log/httpd/api-access_log:
192.168.1.5 api.mycompany.com - - [15/Mar/2015:06:50:01 -0400] "GET /diag/heartbeat HTTP/1.0" 502 495 "-" "Wget/1.11.4 Red Hat modified"

var/log/httpd/api-error_log:
[Sun Mar 15 08:45:06 2015] [error] [client 192.168.1.5] proxy: Error reading from remote server returned by /diag/heartbeat

var/log/httpd/audit_log:
type=USER_END msg=audit(1426380301.674:2285509): user pid=30700 uid=0 auid=0 msg='PAM: session close acct="root" : exe="/usr/sbin/crond" (hostname=?, addr=?, terminal=cron res=success)'

var/log/httpd/default-access_log:
74.77.76.4 dc01.mycompany.com - - [15/Mar/2015:09:33:46 -0400] "GET /prod/shared/700943_image003.jpg HTTP/1.1" 200 751 "http://mail.twc.com/do/mail/message/view?msgId=INBOXDELIM18496" "Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; rv:11.0) like Gecko"

var/log/httpd/error_log:
[Sun Mar 15 13:54:16 2015] [error] [client 107.72.162.115] File does not exist: /var/www/html/portal-prod/apple-touch-icon.png

var/log/httpd/portal-prod-access_log:
192.168.1.5 portal.mycompany.com - - [15/Mar/2015:04:15:02 -0400] "GET /index.php/account/process_upload_file?upload_file=T702135.0315.txt HTTP/1.0" 200 9 "-" "Wget/1.11.4 Red Hat modified"

var/log/httpd/ssl_access_log:
97.77.91.2 - - [15/Mar/2015:10:00:07 -0400] "POST /prod/index.php/api/uploader HTTP/1.1" 200 10

var/log/httpd/ssl_error_log:
[Sun Mar 15 09:00:03 2015] [error] [client 99.187.226.241] client denied by server configuration: /var/www/html/api

var/log/httpd/ssl_request_log:
[15/Mar/2015:11:10:02 -0400] dc01.mycompany.com 216.240.171.98 TLSv1 RC4-MD5 "POST /prod/index.php/api/uploader HTTP/1.1" 7

var/log/httpd/support-access_log:
209.255.201.30 support.mycompany.com - - [15/Mar/2015:04:07:51 -0400] "GET /cron/index.php?/Parser/ParserMinute/POP3IMAP HTTP/1.0" 200 360 "-" "Wget/1.11.4 Red Hat modified"

var/log/httpd/support-error_log:
[Sun Mar 15 04:05:43 2015] [warn] RSA server certificate CommonName (CN) `portal.mycompany.com' does NOT match server name!?

var/log/httpd/web-prod-access_log
62.210.141.227 www.mycompany.com - - [15/Mar/2015:04:38:30 -0400] "HEAD /lib/uploadify/uploadify.swf HTTP/1.1" 404 - "http://www.mycompany.com/lib/uploadify/uploadify.swf" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"

var/log/httpd/web-prod-error_log:
[Sun Mar 15 04:38:30 2015] [error] [client 62.210.141.227] File does not exist: /var/www/html/web-prod/lib, referer: http://www.mycompany.com/lib/uploadify/uploadify.swf

var/log/cron:
Mar 15 04:30:01 lilo crond[22758]: (root) CMD (/opt/mycompnay/bin/check_replication.sh)

var/log/mysqld.log:
150314  5:07:34 [ERROR] Slave SQL: Error 'Deadlock found when trying to get lock; try restarting transaction' on query. Default database: 'my_database'. Query: 'insert into some_table (column_names) values (values)', Error_code: 1213

var/log/openvpn.log:
Sun Mar 15 13:19:31 2015 Re-using SSL/TLS context
Sun Mar 15 12:23:40 2015 don/50.182.238.21:43315 Control Channel: TLSv1, cipher TLSv1/SSLv3 DHE-RSA-AES256-SHA, 1024 bit RSA

var/log/maillog:
Mar 15 05:26:45 lilo postfix/qmgr[4428]: 70460B04004: removed
Mar 15 07:06:40 lilo postfix/smtpd[31732]: connect from boots[192.168.1.4]

codeigniter_logs:
DEBUG - 2015-03-15 14:48:29 --> Session class already loaded. Second attempt ignored.
DEBUG - 2015-03-15 14:48:29 --> Helper loaded: url_helper

每个不同格式的日志文件都需要不同的 grok 模式。使用 [type] 到 运行 这些有条件的是聪明的,因为它减少了处理。

如果您有共享相同 "prefix" 的日志(如系统日志 date/time/priority),您可以先在一个 grok 中提取这些内容,然后再从残余中查找特定内容。

请注意,随着配置文件的增长,您可以将其拆分为磁盘上的多个文件。 Logstash 会将它们合并在一起(按字母顺序)。

所以让我烦恼的一个部分是使用 COMBINEDAPACHELOG 模式解析行时的 geoip 过滤器:

192.168.1.5 portal.mycompany.com - - [15/Mar/2015:04:15:02 -0400] "GET /index.php/account/process_upload_file?upload_file=T702135.0315.txt HTTP/1.0" 200 9 "-" "Wget/1.11.4 Red Hat modified"

它将获取 portal.mycompany.com 的 ip 并使用它来确定位置。使用模式“%{IP:clientip} %{COMBINEDAPACHELOG}”可以解决这个问题。

这是我的过滤器部分:

 if [type] == "apache" {
        if [path] =~ "access" and [path] !~ "ssl_access" {
                mutate { replace => { type => "apache_access" } }
                grok {  match => { "message" => "%{IP:clientip} %{COMBINEDAPACHELOG}" } }
                #grok { match => { "message" => "%{COMBINEDAPACHELOG}" } }
                date {
                        locale => en
                        match => ["timestamp", "dd/MMM/yyyy:HH:mm:ss Z"]
                }
        } else if [path] =~ "ssl_access" {
                mutate { replace => { type => "apache_access" } }
                grok { match => { "message" => "%{COMBINEDAPACHELOG}" } }
                date {
                        locale => en
                        match => ["timestamp", "dd/MMM/yyyy:HH:mm:ss Z"]
                }

        } else if [path] =~ "error" {
                mutate { replace => { type => "apache_error" } }
        }
}

if [agent] != "" {
        useragent { source => "agent" }
}

geoip { source => "clientip" }

在输入部分非常具体也有很大帮助。我仍然需要设置一个 redis 实例来将日志从我们的另一个 dc 发送到这个盒子,但到目前为止它的表现非常出色。

不过,我想要一个包含 Kibana 4 的预打包 ELK 堆栈。 UI 比 Kibana 3 干净得多。