使用正则表达式解析系统日志
Parsing a syslog using Regex
我正在编写正则表达式来解析系统日志条目。在我点击 "CMD" 之前,我很难解析条目。我希望在 CMD 之后出现的所有内容都归入 () 下。另外,能否请您提供改进正则表达式的建议
这是我的系统日志条目:
Nov 21 23:17:01 ubuntu-xenial CRON[10299]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly)
(<?month>[A-z]{3})\s(<?date>[0-9]{2}?)\s(<?time>[0-9]+:[0-9]+:[0-9]+)\s(<?hostname>[a-z]+-[a-z]+)\s(<?daemon>[A-Z]+)(<?pid>\[[0-9]+\]):\s(<?user>\([a-z]+\))
这是我的修改,附有评论。通常,您最好不要对字段将包含的内容做出更少的假设。这里我使用 \S
即 "anything except whitespace"。此外,\s+
将匹配一些空格,无论是一个字符还是多个字符。
(<?month>\S+) #
\s+ # added + because single digit dates might have additional spaces
(<?date>[0-9]{1,2}) # changed {2}? to {1,2} because you might have one or two digits
\s+ #
(<?time>[0-9]+:[0-9]+:[0-9]+) #
\s+ #
(<?hostname>\S+) # anything which isn't whitespace
\s+ #
(<?daemon>\S+) # just in case your daemon has a digit or lower case in its name
(<?pid>\[[0-9]+\]) #
: #
\s+ #
\((<?user>\S+)\) # your username might have digits in it; don't capture the brackets
\s+ #
CMD #
\s+ #
\((<?command>.*)\) # capture the command, not the brackets
\s* # in case of trailing space
$ # match end of string
我正在编写正则表达式来解析系统日志条目。在我点击 "CMD" 之前,我很难解析条目。我希望在 CMD 之后出现的所有内容都归入 () 下。另外,能否请您提供改进正则表达式的建议
这是我的系统日志条目:
Nov 21 23:17:01 ubuntu-xenial CRON[10299]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly)
(<?month>[A-z]{3})\s(<?date>[0-9]{2}?)\s(<?time>[0-9]+:[0-9]+:[0-9]+)\s(<?hostname>[a-z]+-[a-z]+)\s(<?daemon>[A-Z]+)(<?pid>\[[0-9]+\]):\s(<?user>\([a-z]+\))
这是我的修改,附有评论。通常,您最好不要对字段将包含的内容做出更少的假设。这里我使用 \S
即 "anything except whitespace"。此外,\s+
将匹配一些空格,无论是一个字符还是多个字符。
(<?month>\S+) #
\s+ # added + because single digit dates might have additional spaces
(<?date>[0-9]{1,2}) # changed {2}? to {1,2} because you might have one or two digits
\s+ #
(<?time>[0-9]+:[0-9]+:[0-9]+) #
\s+ #
(<?hostname>\S+) # anything which isn't whitespace
\s+ #
(<?daemon>\S+) # just in case your daemon has a digit or lower case in its name
(<?pid>\[[0-9]+\]) #
: #
\s+ #
\((<?user>\S+)\) # your username might have digits in it; don't capture the brackets
\s+ #
CMD #
\s+ #
\((<?command>.*)\) # capture the command, not the brackets
\s* # in case of trailing space
$ # match end of string