为什么 Python Regex Wildcard 只匹配 newLine
Why is Python Regex Wildcard only matching newLine
我正在编写一个程序来使用 Python RegEx 解析日志消息。在日志消息之前,我已经找到了所有内容。这可能是任意数量的字符类型,所以我假设 .* 通配符将是解决此问题的最佳方法。它匹配除新行之外的所有内容。
但是,当我使用通配符时,唯一返回的是本例中的新行。有任何想法吗?这是代码和输出:
import os
import re
#Change to and print correct file path
os.chdir('/Users/MacUser/Desktop/regExPython')
print(os.getcwd())
#Iterate and read from syslogexample.txt then print results
line_number = 0
with open('syslogexample.txt', 'r') as syslog:
log_lines = syslog.readlines()
for line in log_lines:
line_number += 1
print('{:>4} {}'.format(line_number, line.rstrip()))
#Build regex to parse through the data
DATE_RE = r'(\w{3}\s+\d{2})'
TIME_RE = r'(\d{2}:\d{2}:\d{2})'
DEVICE_RE = r'(\S+)'
PROCESS_RE = r'(\S+\s+\S+:)'
MESSAGE_RE = r'(.*)'
CD_RE = r'(\s+)'
Syslog_RE = DATE_RE + CD_RE + \
TIME_RE + CD_RE + \
DEVICE_RE + CD_RE + \
PROCESS_RE + CD_RE + \
MESSAGE_RE
#Use regex to parse through the data
for line in log_lines:
m = re.match(Syslog_RE, line)
if m:
print(m.groups())
#Printed log Files
1 apr 29 08:22:13 mac-users-macbook-8 syslogd[49]: asl sender statistics
2 apr 29 08:22:17 mac-users-macbook-8 com.apple.xpc.launchd[1] (com.apple.xpc.launchd.domain.system):
3 service "com.apple.emond.aslmanager" tried to hijack endpoint "com.apple.aslmanager" from owner:
4 com.apple.aslmanager
5 apr 29 08:22:17 mac-users-macbook-8 com.apple.xpc.launchd[1] (com.apple.xpc.launchd.domain.system):
6 service "com.apple.emond.aslmanager" tried to hijack endpoint
7 "com.apple.activity_tracing.cache-delete" from owner: com.apple.aslmanager
8 apr 29 08:22:17 mac-users-macbook-8 com.apple.xpc.launchd[1] (com.apple.bsd.dirhelper[14184]):
9 endpoint has been activated through legacy launch(3) apis. please switch to xpc or
10 bootstrap_check_in(): com.apple.bsd.dirhelper
11 apr 29 08:22:19 mac-users-macbook-8 com.apple.xpc.launchd[1]
12 (com.apple.imfoundation.imremoteurlconnectionagent): unknown key for integer:
13 _dirtyjetsammemorylimit
Parsed Log Files
('apr 29', ' ', '08:22:17', ' ', 'mac-users-macbook-8', ' ', 'com.apple.xpc.launchd[1] (com.apple.xpc.launchd.domain.system):', '\n', '')
('apr 29', ' ', '08:22:17', ' ', 'mac-users-macbook-8', ' ', 'com.apple.xpc.launchd[1] (com.apple.xpc.launchd.domain.system):', '\n', '')
('apr 29', ' ', '08:22:17', ' ', 'mac-users-macbook-8', ' ', 'com.apple.xpc.launchd[1] (com.apple.bsd.dirhelper[14184]):', '\n', '')
Process finished with exit code 0
正如您在最后看到的那样,MESSAGE_RE 是唯一打印的字符,是我认为根本不会打印的 \n 换行符。
谢谢大家!
在http://www.regex101.com the regex does not work correctly because .*
only captures until newline character, meaning at the linebreak from i.e. line 3 to 4 it stops matching. maybe try re.compile()
and compile the regex before re.match()
. in python regex module there is the DOTALL
flag that enables .
to match newline characters as well http://docs.python.org/2/library/re.html
我正在编写一个程序来使用 Python RegEx 解析日志消息。在日志消息之前,我已经找到了所有内容。这可能是任意数量的字符类型,所以我假设 .* 通配符将是解决此问题的最佳方法。它匹配除新行之外的所有内容。
但是,当我使用通配符时,唯一返回的是本例中的新行。有任何想法吗?这是代码和输出:
import os
import re
#Change to and print correct file path
os.chdir('/Users/MacUser/Desktop/regExPython')
print(os.getcwd())
#Iterate and read from syslogexample.txt then print results
line_number = 0
with open('syslogexample.txt', 'r') as syslog:
log_lines = syslog.readlines()
for line in log_lines:
line_number += 1
print('{:>4} {}'.format(line_number, line.rstrip()))
#Build regex to parse through the data
DATE_RE = r'(\w{3}\s+\d{2})'
TIME_RE = r'(\d{2}:\d{2}:\d{2})'
DEVICE_RE = r'(\S+)'
PROCESS_RE = r'(\S+\s+\S+:)'
MESSAGE_RE = r'(.*)'
CD_RE = r'(\s+)'
Syslog_RE = DATE_RE + CD_RE + \
TIME_RE + CD_RE + \
DEVICE_RE + CD_RE + \
PROCESS_RE + CD_RE + \
MESSAGE_RE
#Use regex to parse through the data
for line in log_lines:
m = re.match(Syslog_RE, line)
if m:
print(m.groups())
#Printed log Files
1 apr 29 08:22:13 mac-users-macbook-8 syslogd[49]: asl sender statistics
2 apr 29 08:22:17 mac-users-macbook-8 com.apple.xpc.launchd[1] (com.apple.xpc.launchd.domain.system):
3 service "com.apple.emond.aslmanager" tried to hijack endpoint "com.apple.aslmanager" from owner:
4 com.apple.aslmanager
5 apr 29 08:22:17 mac-users-macbook-8 com.apple.xpc.launchd[1] (com.apple.xpc.launchd.domain.system):
6 service "com.apple.emond.aslmanager" tried to hijack endpoint
7 "com.apple.activity_tracing.cache-delete" from owner: com.apple.aslmanager
8 apr 29 08:22:17 mac-users-macbook-8 com.apple.xpc.launchd[1] (com.apple.bsd.dirhelper[14184]):
9 endpoint has been activated through legacy launch(3) apis. please switch to xpc or
10 bootstrap_check_in(): com.apple.bsd.dirhelper
11 apr 29 08:22:19 mac-users-macbook-8 com.apple.xpc.launchd[1]
12 (com.apple.imfoundation.imremoteurlconnectionagent): unknown key for integer:
13 _dirtyjetsammemorylimit
Parsed Log Files
('apr 29', ' ', '08:22:17', ' ', 'mac-users-macbook-8', ' ', 'com.apple.xpc.launchd[1] (com.apple.xpc.launchd.domain.system):', '\n', '')
('apr 29', ' ', '08:22:17', ' ', 'mac-users-macbook-8', ' ', 'com.apple.xpc.launchd[1] (com.apple.xpc.launchd.domain.system):', '\n', '')
('apr 29', ' ', '08:22:17', ' ', 'mac-users-macbook-8', ' ', 'com.apple.xpc.launchd[1] (com.apple.bsd.dirhelper[14184]):', '\n', '')
Process finished with exit code 0
正如您在最后看到的那样,MESSAGE_RE 是唯一打印的字符,是我认为根本不会打印的 \n 换行符。
谢谢大家!
在http://www.regex101.com the regex does not work correctly because .*
only captures until newline character, meaning at the linebreak from i.e. line 3 to 4 it stops matching. maybe try re.compile()
and compile the regex before re.match()
. in python regex module there is the DOTALL
flag that enables .
to match newline characters as well http://docs.python.org/2/library/re.html