如何使用Python正则表达式处理zookeeper日志文件?
How to use Python regular expression to process zookeeper logfiles?
我有如下的 zookeeper 日志:
2019-09-25 11:16:39,253 [myid:] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@357] - caught end of stream exception
EndOfStreamException: Unable to read additional data from client sessionid 0x16d666b95e10002, likely client has closed socket
at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)
at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:203)
at java.lang.Thread.run(Thread.java:745)
2019-09-25 11:16:39,260 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1008] - Closed socket connection for client /192.168.101.231:48311 which had sessionid 0x16d666b95e10002
2019-09-25 11:16:40,000 [myid:] - INFO [SessionTracker:ZooKeeperServer@358] - Expiring session 0x36b63c29fbac528, timeout of 10000ms exceeded
我正在尝试获得以下结果:
log entry 1:
2019-09-25 11:16:39,253 [myid:] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@357] - caught end of stream exception
EndOfStreamException: Unable to read additional data from client sessionid 0x16d666b95e10002, likely client has closed socket
at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)
at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:203)
at java.lang.Thread.run(Thread.java:745)
log entry 2:
2019-09-25 11:16:39,260 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1008] - Closed socket connection for client /192.168.101.231:48311 which had sessionid 0x16d666b95e10002
log entry 3:
2019-09-25 11:16:39,260 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1008] - Closed socket connection for client /192.168.101.231:48311 which had sessionid 0x16d666b95e10002
2019-09-25 11:16:40,000 [myid:] - INFO [SessionTracker:ZooKeeperServer@358] - Expiring session 0x36b63c29fbac528, timeout of 10000ms exceeded
我尝试使用以下正则表达式模式:
import re
content = "2019-09-25 11:16:39,253 [myid:] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@357] - caught end of stream exception\n \
EndOfStreamException: Unable to read additional data from client sessionid 0x16d666b95e10002, likely client has closed socket\n \
at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)\n \
at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:203)\n \
at java.lang.Thread.run(Thread.java:745)\n \
2019-09-25 11:16:39,260 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1008] - Closed socket connection for client /192.168.101.231:48311 which had sessionid 0x16d666b95e10002\n \
2019-09-25 11:16:40,000 [myid:] - INFO [SessionTracker:ZooKeeperServer@358] - Expiring session 0x36b63c29fbac528, timeout of 10000ms exceeded\n \
"
pattern = re.compile("(\d{4}-\d{2}-\d{2}\s+\d{2}:\d{2}:\d{2}.*)+",re.DOTALL|re.MULTILINE)
match = re.match(pattern, content)
for f in match.groups():
print(f,"\nEND")
但它匹配全部内容:
2019-09-25 11:16:39,253 [myid:] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@357] - caught end of stream exception
EndOfStreamException: Unable to read additional data from client sessionid 0x16d666b95e10002, likely client has closed socket
at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)
at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:203)
at java.lang.Thread.run(Thread.java:745)
2019-09-25 11:16:39,260 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1008] - Closed socket connection for client /192.168.101.231:48311 which had sessionid 0x16d666b95e10002
2019-09-25 11:16:40,000 [myid:] - INFO [SessionTracker:ZooKeeperServer@358] - Expiring session 0x36b63c29fbac528, timeout of 10000ms exceeded
END
有谁知道如何解决这个问题?非常感谢!
这是您正在尝试的工作版本,稍作修改:
content = """2019-09-25 11:16:39,253 [myid:] - WARN
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@357] - caught end of stream exception\n \
EndOfStreamException: Unable to read additional data from client sessionid
0x16d666b95e10002, likely client has closed socket\n \
at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)\n \
at
org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:203)\n \
at java.lang.Thread.run(Thread.java:745)\n \
2019-09-25 11:16:39,260 [myid:] - INFO
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1008] - Closed socket connection for client /192.168.101.231:48311 which had sessionid 0x16d666b95e10002\n \
2019-09-25 11:16:40,000 [myid:] - INFO [SessionTracker:ZooKeeperServer@358] - Expiring session 0x36b63c29fbac528, timeout of 10000ms exceeded\n \
"""
logs = re.findall(r'\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2},\d{3} \[.*?\] - (?:TRACE|DEBUG|INFO|WARN|ERROR).*?(?=\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2},\d{3} \[.*?\] - (?:TRACE|DEBUG|INFO|WARN|ERROR)|$)', content, flags=re.DOTALL)
print(logs)
这会打印:
['2019-09-25 11:16:39,253 [myid:] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@357] - caught end of stream exception\n EndOfStreamException: Unable to read additional data from client sessionid 0x16d666b95e10002, likely client has closed socket\n at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)\n at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:203)\n at java.lang.Thread.run(Thread.java:745)\n ',
'2019-09-25 11:16:39,260 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1008] - Closed socket connection for client /192.168.101.231:48311 which had sessionid 0x16d666b95e10002\n ',
'2019-09-25 11:16:40,000 [myid:] - INFO [SessionTracker:ZooKeeperServer@358] - Expiring session 0x36b63c29fbac528, timeout of 10000ms exceeded\n ']
此处使用的正则表达式逻辑将日志行条目的开头定义为时间戳,后跟破折号和状态之一(即 TRACE
、DEBUG
、INFO
WARN
、ERROR
)。模式使用 .*
跨行匹配,在全点模式下,直到命中另一个日志条目的开头或输入的结尾。
您可以尝试以下正则表达式:
\d{4}(?:-\d{2}){2} \d{2}(?::\d{2}){2},\d{3}(?:(?!\d{4}(?:-\d{2}){2} \d{2}(?::\d{2}){2},\d{3})[\s\S])*
解释:
\d{4}(?:-\d{2}){2} \d{2}(?::\d{2}){2},\d{3}
- 匹配模式 XXXX-XX-XX XX:XX:XX,XXX
的时间戳,其中 X 是数字
(?:(?!\d{4}(?:-\d{2}){2} \d{2}(?::\d{2}){2},\d{3})[\s\S])*
- 匹配出现 0 次以上的任何字符,只要它不是以上面指针 1 中提到的格式的另一个时间戳开头。
我有如下的 zookeeper 日志:
2019-09-25 11:16:39,253 [myid:] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@357] - caught end of stream exception
EndOfStreamException: Unable to read additional data from client sessionid 0x16d666b95e10002, likely client has closed socket
at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)
at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:203)
at java.lang.Thread.run(Thread.java:745)
2019-09-25 11:16:39,260 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1008] - Closed socket connection for client /192.168.101.231:48311 which had sessionid 0x16d666b95e10002
2019-09-25 11:16:40,000 [myid:] - INFO [SessionTracker:ZooKeeperServer@358] - Expiring session 0x36b63c29fbac528, timeout of 10000ms exceeded
我正在尝试获得以下结果:
log entry 1:
2019-09-25 11:16:39,253 [myid:] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@357] - caught end of stream exception
EndOfStreamException: Unable to read additional data from client sessionid 0x16d666b95e10002, likely client has closed socket
at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)
at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:203)
at java.lang.Thread.run(Thread.java:745)
log entry 2:
2019-09-25 11:16:39,260 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1008] - Closed socket connection for client /192.168.101.231:48311 which had sessionid 0x16d666b95e10002
log entry 3:
2019-09-25 11:16:39,260 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1008] - Closed socket connection for client /192.168.101.231:48311 which had sessionid 0x16d666b95e10002
2019-09-25 11:16:40,000 [myid:] - INFO [SessionTracker:ZooKeeperServer@358] - Expiring session 0x36b63c29fbac528, timeout of 10000ms exceeded
我尝试使用以下正则表达式模式:
import re
content = "2019-09-25 11:16:39,253 [myid:] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@357] - caught end of stream exception\n \
EndOfStreamException: Unable to read additional data from client sessionid 0x16d666b95e10002, likely client has closed socket\n \
at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)\n \
at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:203)\n \
at java.lang.Thread.run(Thread.java:745)\n \
2019-09-25 11:16:39,260 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1008] - Closed socket connection for client /192.168.101.231:48311 which had sessionid 0x16d666b95e10002\n \
2019-09-25 11:16:40,000 [myid:] - INFO [SessionTracker:ZooKeeperServer@358] - Expiring session 0x36b63c29fbac528, timeout of 10000ms exceeded\n \
"
pattern = re.compile("(\d{4}-\d{2}-\d{2}\s+\d{2}:\d{2}:\d{2}.*)+",re.DOTALL|re.MULTILINE)
match = re.match(pattern, content)
for f in match.groups():
print(f,"\nEND")
但它匹配全部内容:
2019-09-25 11:16:39,253 [myid:] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@357] - caught end of stream exception
EndOfStreamException: Unable to read additional data from client sessionid 0x16d666b95e10002, likely client has closed socket
at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)
at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:203)
at java.lang.Thread.run(Thread.java:745)
2019-09-25 11:16:39,260 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1008] - Closed socket connection for client /192.168.101.231:48311 which had sessionid 0x16d666b95e10002
2019-09-25 11:16:40,000 [myid:] - INFO [SessionTracker:ZooKeeperServer@358] - Expiring session 0x36b63c29fbac528, timeout of 10000ms exceeded
END
有谁知道如何解决这个问题?非常感谢!
这是您正在尝试的工作版本,稍作修改:
content = """2019-09-25 11:16:39,253 [myid:] - WARN
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@357] - caught end of stream exception\n \
EndOfStreamException: Unable to read additional data from client sessionid
0x16d666b95e10002, likely client has closed socket\n \
at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)\n \
at
org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:203)\n \
at java.lang.Thread.run(Thread.java:745)\n \
2019-09-25 11:16:39,260 [myid:] - INFO
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1008] - Closed socket connection for client /192.168.101.231:48311 which had sessionid 0x16d666b95e10002\n \
2019-09-25 11:16:40,000 [myid:] - INFO [SessionTracker:ZooKeeperServer@358] - Expiring session 0x36b63c29fbac528, timeout of 10000ms exceeded\n \
"""
logs = re.findall(r'\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2},\d{3} \[.*?\] - (?:TRACE|DEBUG|INFO|WARN|ERROR).*?(?=\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2},\d{3} \[.*?\] - (?:TRACE|DEBUG|INFO|WARN|ERROR)|$)', content, flags=re.DOTALL)
print(logs)
这会打印:
['2019-09-25 11:16:39,253 [myid:] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@357] - caught end of stream exception\n EndOfStreamException: Unable to read additional data from client sessionid 0x16d666b95e10002, likely client has closed socket\n at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)\n at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:203)\n at java.lang.Thread.run(Thread.java:745)\n ',
'2019-09-25 11:16:39,260 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1008] - Closed socket connection for client /192.168.101.231:48311 which had sessionid 0x16d666b95e10002\n ',
'2019-09-25 11:16:40,000 [myid:] - INFO [SessionTracker:ZooKeeperServer@358] - Expiring session 0x36b63c29fbac528, timeout of 10000ms exceeded\n ']
此处使用的正则表达式逻辑将日志行条目的开头定义为时间戳,后跟破折号和状态之一(即 TRACE
、DEBUG
、INFO
WARN
、ERROR
)。模式使用 .*
跨行匹配,在全点模式下,直到命中另一个日志条目的开头或输入的结尾。
您可以尝试以下正则表达式:
\d{4}(?:-\d{2}){2} \d{2}(?::\d{2}){2},\d{3}(?:(?!\d{4}(?:-\d{2}){2} \d{2}(?::\d{2}){2},\d{3})[\s\S])*
解释:
\d{4}(?:-\d{2}){2} \d{2}(?::\d{2}){2},\d{3}
- 匹配模式XXXX-XX-XX XX:XX:XX,XXX
的时间戳,其中 X 是数字(?:(?!\d{4}(?:-\d{2}){2} \d{2}(?::\d{2}){2},\d{3})[\s\S])*
- 匹配出现 0 次以上的任何字符,只要它不是以上面指针 1 中提到的格式的另一个时间戳开头。