匹配grok中的多个字段和可变字段

Match multiple fields and variable fields in grok

我正在做一个项目,我必须从 apache 服务器扫描一些错误日志文件。

我正在构建一个 grok 模式来扫描这些错误文件。

目前,这是我的模式:

\[(?<timestamp>%{DAY:day} %{MONTH:month} %{MONTHDAY} %{TIME} %{YEAR})\]\s\[:%{LOGLEVEL:loglevel}\]\s\[pid %{NUMBER:pid}]\s\[client %{IP:clientip}:.*]\s\[client %{IP:clientip2}.*\]\sModSecurity:\s%{WORD:modSecurity}.\s(%{GREEDYDATA:error}\.\s)\[file\s%{QS:path_file}\]\s\[line %{QS:line}]\s\[id %{QS:id}\]\s\[msg %{QS:message}\]\s{0,1}(\[data %{QS:data}\])\s{0,1}(\[severity %{QS:severity}\])\s{0,1}(\[ver %{QS:ver}\])\s{1,5}(\[tag %{QS:tag})\].*\[hostname %{QS:hostname}\]\s\[uri %{QS:uri}\]\s\[unique_id %{QS:unique_id}\]\,\sreferer:\s%{URI:referer}

这些是日志文件的两个示例:

[Wed Aug 25 12:55:58.601261 2021] [:error] [pid 20282] [client 83.216.165.253:59075] [client 83.216.165.253] ModSecurity: Warning. Match of "rx (?:\\x1f\\x8b\\x08|\\b(?:(?:i(?:nterplay|hdr|d3)|m(?:ovi|thd)|r(?:ar!|iff)|(?:ex|jf)if|f(?:lv|ws)|varg|cws)\\b|gif)|B(?:%pdf|\\.ra)\\b|^wOF(?:F|2))" against "RESPONSE_BODY" required. [file "/usr/share/modsecurity-crs/rules/RESPONSE-953-DATA-LEAKAGES-PHP.conf"] [line "105"] [id "953120"] [msg "PHP source code leakage"] [data "Matched Data: <? found within RESPONSE_BODY: \x03\xa4<\x11U\xb5\x1f\x0e\xa0\x91\xb2p\xfe~\xff\x9b\xa9\xd5\xdd\x97\x13\x0ay\xb1\xa5j\x92\xc2B\xee\xa1S\xdb\x9a^R=[\x9c\xd1\xfb\x04>$\xc4 \xc1\x22@Y\x8aJ\xb7\xfb\x5c\xee\xe3\x93\xd9\xf4\xef\x5cN\xaf\xea\x22\xf3#\x0c\x06\x82\x1b\xf3\xd5-+Y6\x92\xb4\x93e\x18\xd9\x92\x8d\x1ac\xb9\x92\x800\xc1\xff\xff^-\xd3`+\x04\x05\xe5\x84\x80-,\x04T\x98\xc3\xeb\xbd\xf7=\xf0\xa5/P\x81\xc6\x9asb\xd9\x02\xf2x\x80J\xca\xb4{\xef{_#}\xc9Y\xb9\xe4\xac\xcb\xccNm\xb2\xa7oin)\xbd\x10\xe..."] [severity "ERROR"] [ver "OWASP_CRS/3.2.0"] [tag "application-multi"] [tag "language-php"] [tag "platform-multi"] [tag "attack-disclosure"] [tag "OWASP_CRS"] [t [hostname "shop.gnet.it"] [uri "/password"] [unique_id "YSY93rJJTBga6-8ecOI@VAAAAAE"], referer: https://shop.gnet.it/password

还有一个:

[Wed Aug 25 12:55:58.601666 2021] [:error] [pid 20282] [client 83.216.165.253:59075] [client 83.216.165.253] ModSecurity: Warning. Operator GE matched 4 at TX:outbound_anomaly_score. [file "/usr/share/modsecurity-crs/rules/RESPONSE-959-BLOCKING-EVALUATION.conf"] [line "75"] [id "959100"] [msg "Outbound Anomaly Score Exceeded (Total Score: 4)"] [tag "anomaly-evaluation"] [hostname "shop.gnet.it"] [uri "/password"] [unique_id "YSY93rJJTBga6-8ecOI@VAAAAAE"], referer: https://shop.gnet.it/password

“data”、“severity”和“ver”字段并不总是在日志文件中,“tag”字段有时会重复多次。当这些字段不在文件中时,我如何重复字段而不读取它们? 在此模式下构建的模式不起作用。

您可以使用

\[(?<timestamp>%{DAY:day} %{MONTH:month} %{MONTHDAY} %{TIME} %{YEAR})\]\s+\[:%{LOGLEVEL:loglevel}\]\s+\[pid %{NUMBER:pid}]\s+\[client %{IP:clientip}:.*?]\s+\[client %{IP:clientip2}.*?\]\s+ModSecurity:\s+%{WORD:modSecurity}.\s+%{GREEDYDATA:error}\.\s+\[file\s%{QS:path_file}\]\s+\[line %{QS:line}]\s+\[id %{QS:id}\]\s+\[msg %{QS:message}\](?:\s+\[data %{QS:data}\])?(?:\s+\[severity %{QS:severity}\])?(?:\s+\[ver %{QS:ver}\])?\s+\[tag %{QS:tag}\](?:\s+\[tag %{QS:tag2}\])?(?:\s+\[tag %{QS:tag3}\])?(?:\s+\[tag %{QS:tag4}\])?(?:\s+\[tag %{QS:tag5}\])?.*?\[hostname %{QS:hostname}\]\s+\[uri %{QS:uri}\]\s+\[unique_id %{QS:unique_id}\],\s+referer:\s%{URI:referer}

这里,tag最多重复5次,三个字段现在是可选的。

我还将 \s 替换为 \s+ 以确保无论字段之间有多少空格都可以匹配,并且我从模式中删除了多余的括号。