通过应用条件解析日志文件
Parsing a log file by applying condition
我有一个调试日志文件,如下所示:
示例文件:
DEBUG: Fri Dec 7 06:49:14 2018:16920 extra text
DEBUG: Fri Dec 7 06:49:14 2018:16920: start <ID>
DEBUG: Fri Dec 7 06:49:14 2018:16920: Final output is "output
output output
output"
DEBUG: extra lines
我只想获取 ID 和最终输出,如下所示。
预期输出:
<ID> "output
output output
output"
我想在 python 或 bash 中执行此操作。任何帮助,将不胜感激。
谢谢
当前代码仅适用于 "final output"。但我也想获取 ID,并且应该有一种方法来区分每个 ID 及其输出的(分隔符)。
stream=open("debuglog.txt","r")
lines=stream.readlines()
flag = 0
for i in lines:
if "DEBUG:" in i:
flag = 0
if "final output is" in i:
flag = 1
if flag:
print(i)
示例日志文件:
DEBUG: Fri Dec 7 06:49:14 2018:16920 extra text
DEBUG: Fri Dec 7 06:49:14 2018:16920: start 12324
DEBUG: Fri Dec 7 06:49:14 2018:16920: Final output is "output output output output"
DEBUG: extra lines
请查找代码。另外,我假设每个 ID 和输出只有一个实例
import sys, re
stream=open("log","r")
lines=stream.readlines()
flag_ID = 0
flag_output = 0
flag_print = 1
for i in lines:
ID = re.match("DEBUG: [\w :]* start (\d+)", i)
output = re.match("DEBUG: [\w :]* Final output is \"([\w ]*)\"", i)
if ID:
flag_ID = 1
value_ID = ID.group(1)
if output:
flag_output = 1
value_output = output.group(1)
if flag_output == 1 and flag_ID == 1 and flag_print == 1:
print "{0} {1}".format(value_ID, value_output)
flag_print = 0
输出
12324 output output output output
如果这能解决您的问题,请打勾并接受 ;)
和python,怎么样:
#!/usr/bin/python
import re
text = open("logfile", "r").read()
regex = r'start (.+?)$.*?Final output is (.+?)(?:(?=\nDEBUG)|\Z)'
for m in re.finditer(regex, text, re.MULTILINE|re.DOTALL):
for i in m.groups():
print(i.replace('\n', ' '))
输入日志文件:
DEBUG: Fri Dec 7 06:49:14 2018:16920 extra text
DEBUG: Fri Dec 7 06:49:14 2018:16920: start <ID>
DEBUG: Fri Dec 7 06:49:14 2018:16920: Final output is "output
output output
output"
DEBUG: extra lines
DEBUG: Fri Dec 7 06:49:14 2018:16920 extra text
DEBUG: Fri Dec 7 06:49:14 2018:16920: start <ID2>
DEBUG: Fri Dec 7 06:49:14 2018:16920: Final output is "output2
output+ output/
output2"
输出:
<ID>
"output output output output"
<ID2>
"output2 output+ output/ output2"
- 正则表达式中的第一个括号捕获
start
之后和换行符之前的任何字符,并将字符串存储到 1st group
.
- 正则表达式中的第二个括号还捕获
Final output is
之后和 DEBUG
之前或字符串末尾的任何字符,并将字符串存储到 2nd group
。由于 re.DOTALL
选项,换行符可以包含在字符串中。
- 第3个parens是null-length anchor,不包含在捕获组中。
编辑
下面的更新版本为单个 ID 处理多个 "final output" 并且
仅显示每个 ID 的最后输出:
#!/usr/bin/python
import re
text = open("logfile", "r").read()
regex = r'start (.+?)$(.+?)(?:(?=DEBUG[^\n]+?start)|\Z)+'
regex2 = r'Final output is (.+?)(?:(?=\nDEBUG)|\Z)'
for m in re.finditer(regex, text, re.MULTILINE|re.DOTALL):
print m.group(1)
m2 = re.finditer(regex2, m.group(2), re.MULTILINE|re.DOTALL)
print list(m2).pop().group(1).replace('\n', ' ')
输入日志文件:
DEBUG: Fri Dec 7 06:49:14 2018:16920 extra text
DEBUG: Fri Dec 7 06:49:14 2018:16920: start <ID1>
DEBUG: Fri Dec 7 06:49:14 2018:16920: Final output is "output
output output
output"
DEBUG: extra lines
DEBUG: Fri Dec 7 06:49:14 2018:16920: Final output is "this
is the last output
for <ID1>"
DEBUG: extra lines
DEBUG: Fri Dec 7 06:49:14 2018:16920 extra text
DEBUG: Fri Dec 7 06:49:14 2018:16920: start <ID2>
DEBUG: Fri Dec 7 06:49:14 2018:16920: Final output is "output2
output+ output/
output2"
并输出:
<ID1>
"this is the last output for <ID1>"
<ID2>
"output2 output+ output/ output2"
我把子串的提取分为两步:
- 提取 ID 和剩余文本(可能包含额外的字符串)。这是用
regex
. 处理的
- 从上面的 "remaining text" 中提取 "final output" 个子字符串。这是用
regex2
. 处理的
然后选择最后一个 "final output" 并显示。
编辑
以下版本禁止包含某些关键字的消息:
#!/usr/bin/python
import re
text = open("logfile", "r").read()
exclude = 'xyz' # keyword to suppress the output
regex = r'start (.+?)$(.+?)(?:(?=DEBUG[^\n]+?start)|\Z)+'
regex2 = r'Final output is (.+?)(?:(?=\nDEBUG)|\Z)'
#regex = r'start (.+?)$.*?Final output is (.+?)(?=\nDEBUG)'
#for m in re.finditer(regex, text, flags=(re.MULTILINE|re.DOTALL)):
for m in re.finditer(regex, text, re.MULTILINE|re.DOTALL):
print m.group(1)
m2 = re.finditer(regex2, m.group(2), re.MULTILINE|re.DOTALL)
message = list(m2).pop().group(1).replace('\n', ' ')
if message.count(exclude):
print 'error:' + exclude
else:
print message
示例日志文件:
DEBUG: Fri Dec 7 06:49:14 2018:16920 extra text
DEBUG: Fri Dec 7 06:49:14 2018:16920: start <ID1>
DEBUG: Fri Dec 7 06:49:14 2018:16920: Final output is "output
output output
output"
DEBUG: extra lines
DEBUG: Fri Dec 7 06:49:14 2018:16920: Final output is "this
is the last output
for ID1"
DEBUG: extra lines
DEBUG: Fri Dec 7 06:49:14 2018:16920 extra text
DEBUG: Fri Dec 7 06:49:14 2018:16920: start <ID2>
DEBUG: Fri Dec 7 06:49:14 2018:16920: Final output is "output2
output+ output/
output2"
DEBUG: extra lines
DEBUG: Fri Dec 7 06:49:14 2018:16920 extra text
DEBUG: Fri Dec 7 06:49:14 2018:16920: start <ID3>
DEBUG: Fri Dec 7 06:49:14 2018:16920: Final output is "this message
contains the word xyz"
DEBUG: extra lines
输出:
<ID1>
"this is the last output for ID1"
<ID2>
"output2 output+ output/ output2"
<ID3>
error:xyz
使用 Perl,如果文件可以放入内存,您可以使用一行来完成..
/tmp> cat debug.log
DEBUG: Fri Dec 7 06:49:14 2018:16920 extra text
DEBUG: Fri Dec 7 06:49:14 2018:16920: start <ID1>
DEBUG: Fri Dec 7 06:49:14 2018:16920: Final output is "output
output output
output"
DEBUG: extra lines
DEBUG: Fri Dec 7 06:49:14 2018:16921 extra text
DEBUG: Fri Dec 7 06:49:14 2018:16921: start <ID2>
DEBUG: Fri Dec 7 06:49:14 2018:16921: Final output is "output output output output"
DEBUG: extra lines
/tmpl>
/tmp> perl -0777 -ne ' while(/^DEBUG(.+?)start (\S+).*?DEBUG.+?Final output is \"(.+?)\"/smg) { print " \n" } ' debug.log
<ID1> output
output output
output
<ID2> output output output output
/tmp>
我有一个调试日志文件,如下所示:
示例文件:
DEBUG: Fri Dec 7 06:49:14 2018:16920 extra text
DEBUG: Fri Dec 7 06:49:14 2018:16920: start <ID>
DEBUG: Fri Dec 7 06:49:14 2018:16920: Final output is "output
output output
output"
DEBUG: extra lines
我只想获取 ID 和最终输出,如下所示。
预期输出:
<ID> "output
output output
output"
我想在 python 或 bash 中执行此操作。任何帮助,将不胜感激。 谢谢
当前代码仅适用于 "final output"。但我也想获取 ID,并且应该有一种方法来区分每个 ID 及其输出的(分隔符)。
stream=open("debuglog.txt","r")
lines=stream.readlines()
flag = 0
for i in lines:
if "DEBUG:" in i:
flag = 0
if "final output is" in i:
flag = 1
if flag:
print(i)
示例日志文件:
DEBUG: Fri Dec 7 06:49:14 2018:16920 extra text
DEBUG: Fri Dec 7 06:49:14 2018:16920: start 12324
DEBUG: Fri Dec 7 06:49:14 2018:16920: Final output is "output output output output"
DEBUG: extra lines
请查找代码。另外,我假设每个 ID 和输出只有一个实例
import sys, re
stream=open("log","r")
lines=stream.readlines()
flag_ID = 0
flag_output = 0
flag_print = 1
for i in lines:
ID = re.match("DEBUG: [\w :]* start (\d+)", i)
output = re.match("DEBUG: [\w :]* Final output is \"([\w ]*)\"", i)
if ID:
flag_ID = 1
value_ID = ID.group(1)
if output:
flag_output = 1
value_output = output.group(1)
if flag_output == 1 and flag_ID == 1 and flag_print == 1:
print "{0} {1}".format(value_ID, value_output)
flag_print = 0
输出
12324 output output output output
如果这能解决您的问题,请打勾并接受 ;)
和python,怎么样:
#!/usr/bin/python
import re
text = open("logfile", "r").read()
regex = r'start (.+?)$.*?Final output is (.+?)(?:(?=\nDEBUG)|\Z)'
for m in re.finditer(regex, text, re.MULTILINE|re.DOTALL):
for i in m.groups():
print(i.replace('\n', ' '))
输入日志文件:
DEBUG: Fri Dec 7 06:49:14 2018:16920 extra text
DEBUG: Fri Dec 7 06:49:14 2018:16920: start <ID>
DEBUG: Fri Dec 7 06:49:14 2018:16920: Final output is "output
output output
output"
DEBUG: extra lines
DEBUG: Fri Dec 7 06:49:14 2018:16920 extra text
DEBUG: Fri Dec 7 06:49:14 2018:16920: start <ID2>
DEBUG: Fri Dec 7 06:49:14 2018:16920: Final output is "output2
output+ output/
output2"
输出:
<ID>
"output output output output"
<ID2>
"output2 output+ output/ output2"
- 正则表达式中的第一个括号捕获
start
之后和换行符之前的任何字符,并将字符串存储到1st group
. - 正则表达式中的第二个括号还捕获
Final output is
之后和DEBUG
之前或字符串末尾的任何字符,并将字符串存储到2nd group
。由于re.DOTALL
选项,换行符可以包含在字符串中。 - 第3个parens是null-length anchor,不包含在捕获组中。
编辑
下面的更新版本为单个 ID 处理多个 "final output" 并且 仅显示每个 ID 的最后输出:
#!/usr/bin/python
import re
text = open("logfile", "r").read()
regex = r'start (.+?)$(.+?)(?:(?=DEBUG[^\n]+?start)|\Z)+'
regex2 = r'Final output is (.+?)(?:(?=\nDEBUG)|\Z)'
for m in re.finditer(regex, text, re.MULTILINE|re.DOTALL):
print m.group(1)
m2 = re.finditer(regex2, m.group(2), re.MULTILINE|re.DOTALL)
print list(m2).pop().group(1).replace('\n', ' ')
输入日志文件:
DEBUG: Fri Dec 7 06:49:14 2018:16920 extra text
DEBUG: Fri Dec 7 06:49:14 2018:16920: start <ID1>
DEBUG: Fri Dec 7 06:49:14 2018:16920: Final output is "output
output output
output"
DEBUG: extra lines
DEBUG: Fri Dec 7 06:49:14 2018:16920: Final output is "this
is the last output
for <ID1>"
DEBUG: extra lines
DEBUG: Fri Dec 7 06:49:14 2018:16920 extra text
DEBUG: Fri Dec 7 06:49:14 2018:16920: start <ID2>
DEBUG: Fri Dec 7 06:49:14 2018:16920: Final output is "output2
output+ output/
output2"
并输出:
<ID1>
"this is the last output for <ID1>"
<ID2>
"output2 output+ output/ output2"
我把子串的提取分为两步:
- 提取 ID 和剩余文本(可能包含额外的字符串)。这是用
regex
. 处理的
- 从上面的 "remaining text" 中提取 "final output" 个子字符串。这是用
regex2
. 处理的
然后选择最后一个 "final output" 并显示。
编辑
以下版本禁止包含某些关键字的消息:
#!/usr/bin/python
import re
text = open("logfile", "r").read()
exclude = 'xyz' # keyword to suppress the output
regex = r'start (.+?)$(.+?)(?:(?=DEBUG[^\n]+?start)|\Z)+'
regex2 = r'Final output is (.+?)(?:(?=\nDEBUG)|\Z)'
#regex = r'start (.+?)$.*?Final output is (.+?)(?=\nDEBUG)'
#for m in re.finditer(regex, text, flags=(re.MULTILINE|re.DOTALL)):
for m in re.finditer(regex, text, re.MULTILINE|re.DOTALL):
print m.group(1)
m2 = re.finditer(regex2, m.group(2), re.MULTILINE|re.DOTALL)
message = list(m2).pop().group(1).replace('\n', ' ')
if message.count(exclude):
print 'error:' + exclude
else:
print message
示例日志文件:
DEBUG: Fri Dec 7 06:49:14 2018:16920 extra text
DEBUG: Fri Dec 7 06:49:14 2018:16920: start <ID1>
DEBUG: Fri Dec 7 06:49:14 2018:16920: Final output is "output
output output
output"
DEBUG: extra lines
DEBUG: Fri Dec 7 06:49:14 2018:16920: Final output is "this
is the last output
for ID1"
DEBUG: extra lines
DEBUG: Fri Dec 7 06:49:14 2018:16920 extra text
DEBUG: Fri Dec 7 06:49:14 2018:16920: start <ID2>
DEBUG: Fri Dec 7 06:49:14 2018:16920: Final output is "output2
output+ output/
output2"
DEBUG: extra lines
DEBUG: Fri Dec 7 06:49:14 2018:16920 extra text
DEBUG: Fri Dec 7 06:49:14 2018:16920: start <ID3>
DEBUG: Fri Dec 7 06:49:14 2018:16920: Final output is "this message
contains the word xyz"
DEBUG: extra lines
输出:
<ID1>
"this is the last output for ID1"
<ID2>
"output2 output+ output/ output2"
<ID3>
error:xyz
使用 Perl,如果文件可以放入内存,您可以使用一行来完成..
/tmp> cat debug.log
DEBUG: Fri Dec 7 06:49:14 2018:16920 extra text
DEBUG: Fri Dec 7 06:49:14 2018:16920: start <ID1>
DEBUG: Fri Dec 7 06:49:14 2018:16920: Final output is "output
output output
output"
DEBUG: extra lines
DEBUG: Fri Dec 7 06:49:14 2018:16921 extra text
DEBUG: Fri Dec 7 06:49:14 2018:16921: start <ID2>
DEBUG: Fri Dec 7 06:49:14 2018:16921: Final output is "output output output output"
DEBUG: extra lines
/tmpl>
/tmp> perl -0777 -ne ' while(/^DEBUG(.+?)start (\S+).*?DEBUG.+?Final output is \"(.+?)\"/smg) { print " \n" } ' debug.log
<ID1> output
output output
output
<ID2> output output output output
/tmp>