在两行之间修剪日志文件
trimming log file between two line
我在两行之间写了一个 python 脚本到 trim 日志文件。
这是我写的:
import optparse
import datetime
parser = optparse.OptionParser()
parser.add_option("-f","--file",dest="log_file",
action="store",help="Specify log file to be parsed")
options, args = parser.parse_args()
vLogFile=options.log_file
start_time = raw_input("Please enter start time:\n[Format: HH:MM]=")
end_time = raw_input("Please enter end time:\n[Format: HH:MM]=")
trim_time = datetime.datetime.now().strftime('%d%H%M%S')
output_file = 'trimmed_log_%s.txt' %trim_time
with open(vLogFile) as file:
for vline in file:
vDate = vline[0:10]
break
start_line = vDate + ' ' + start_time
end_line = vDate + ' ' +end_time
print("Start time:%s" %start_line)
print("End time:%s" %end_line)
for num, line in enumerate(file, 1):
if line.startswith(start_line):
start_line_number = num
break
for num, line in enumerate(file, 1):
if line.startswith(end_line):
end_line_number = num
break
file.close()
print(start_line_number,end_line_number)
with open(vLogFile,"r") as file:
oFile = open(output_file,'a')
for num, line in enumerate(file, 1):
if num >= start_line_number and num <= end_line_number:
oFile.write(line)
print("%s Created" %output_file)
下面是一个脚本的结果:
$ python trim.py -f ErrorLog.txt
Please enter start time:
[Format: HH:MM]=16:16
Please enter end time:
[Format: HH:MM]=16:29
Start time:2017-11-12 16:16
End time:2017-11-12 16:29
(333, 2084)
trimmed_log_23063222.txt Created
此处开始行(333)正确,结束行(2084)不正确
这是我的 log file:
有人可以帮我解决这个问题吗?
谢谢,
约格什
问题是您在不倒带的情况下枚举打开的文件,因此行号不再正确。您可以使用 input_file.seek(0)
来做到这一点,但还有更简单的方法。像这样的东西可能适用于主循环(干编码,YMMV)——此外,它只读取文件一次。
with open(vLogFile) as input_file, open(output_file, 'a') as output_file:
do_write = False
for i, line in enumerate(file, 1):
if i == 1: # First line, so figure out the start/end markers
vDate = vline[0:10]
start_line = vDate + ' ' + start_time
end_line = vDate + ' ' +end_time
if not do_write and line.startswith(start_line): # If we need to start copying...
do_write = True
print('Starting to write from line %d', i)
if do_write:
output_file.write(line)
if line.startswith(end_line): # Stop writing, we have everything
print('Stopping write on line %d', i)
break
这对itertools.dropwhile()
and itertools.takewhile()
很有用:
import itertools
from datetime import datetime
start_time = datetime.strptime("16:16", "%H:%M")
end_time = datetime.strptime("16:29", "%H:%M")
with open('ErrorLog.txt') as f_log, open('trimmed.txt', 'w') as f_trimmed:
for row in itertools.dropwhile(lambda x: datetime.strptime(x[11:16], "%H:%M") < start_time, f_log):
f_trimmed.write(row)
break
for row in itertools.takewhile(lambda x: datetime.strptime(x[11:16], "%H:%M") < end_time, f_log):
f_trimmed.write(row)
这将为您提供如下输出 trimmed.txt
:
2017-11-12 16:16:16.642 Info: Forest Extensions state changed from open to start closing because shutting down
2017-11-12 16:16:16.642 Info: Database Extensions is offline
2017-11-12 16:16:16.643 Info: Forest Extensions state changed from start closing to middle closing because shutting down
.
.
2017-11-12 16:24:07.161 Info: Deleted 1 MB at 345 MB/sec /Users/yogeshjadhav96/Library/Application Support/MarkLogic/Data/Forests/App-Services/000001db
2017-11-12 16:24:07.165 Info: Deleted 10 MB at 2361 MB/sec /Users/yogeshjadhav96/Library/Application Support/MarkLogic/Data/Forests/App-Services/000001dc
这会过滤掉不符合起始要求的行,即太早的行,然后只读取行直到结束要求。每个 row
被读入并使用 lambda 函数提取时间,将其转换为 datetime
对象并相应地与 start_time
或 end_time
进行比较。
我在两行之间写了一个 python 脚本到 trim 日志文件。
这是我写的:
import optparse
import datetime
parser = optparse.OptionParser()
parser.add_option("-f","--file",dest="log_file",
action="store",help="Specify log file to be parsed")
options, args = parser.parse_args()
vLogFile=options.log_file
start_time = raw_input("Please enter start time:\n[Format: HH:MM]=")
end_time = raw_input("Please enter end time:\n[Format: HH:MM]=")
trim_time = datetime.datetime.now().strftime('%d%H%M%S')
output_file = 'trimmed_log_%s.txt' %trim_time
with open(vLogFile) as file:
for vline in file:
vDate = vline[0:10]
break
start_line = vDate + ' ' + start_time
end_line = vDate + ' ' +end_time
print("Start time:%s" %start_line)
print("End time:%s" %end_line)
for num, line in enumerate(file, 1):
if line.startswith(start_line):
start_line_number = num
break
for num, line in enumerate(file, 1):
if line.startswith(end_line):
end_line_number = num
break
file.close()
print(start_line_number,end_line_number)
with open(vLogFile,"r") as file:
oFile = open(output_file,'a')
for num, line in enumerate(file, 1):
if num >= start_line_number and num <= end_line_number:
oFile.write(line)
print("%s Created" %output_file)
下面是一个脚本的结果:
$ python trim.py -f ErrorLog.txt
Please enter start time:
[Format: HH:MM]=16:16
Please enter end time:
[Format: HH:MM]=16:29
Start time:2017-11-12 16:16
End time:2017-11-12 16:29
(333, 2084)
trimmed_log_23063222.txt Created
此处开始行(333)正确,结束行(2084)不正确
这是我的 log file:
有人可以帮我解决这个问题吗?
谢谢, 约格什
问题是您在不倒带的情况下枚举打开的文件,因此行号不再正确。您可以使用 input_file.seek(0)
来做到这一点,但还有更简单的方法。像这样的东西可能适用于主循环(干编码,YMMV)——此外,它只读取文件一次。
with open(vLogFile) as input_file, open(output_file, 'a') as output_file:
do_write = False
for i, line in enumerate(file, 1):
if i == 1: # First line, so figure out the start/end markers
vDate = vline[0:10]
start_line = vDate + ' ' + start_time
end_line = vDate + ' ' +end_time
if not do_write and line.startswith(start_line): # If we need to start copying...
do_write = True
print('Starting to write from line %d', i)
if do_write:
output_file.write(line)
if line.startswith(end_line): # Stop writing, we have everything
print('Stopping write on line %d', i)
break
这对itertools.dropwhile()
and itertools.takewhile()
很有用:
import itertools
from datetime import datetime
start_time = datetime.strptime("16:16", "%H:%M")
end_time = datetime.strptime("16:29", "%H:%M")
with open('ErrorLog.txt') as f_log, open('trimmed.txt', 'w') as f_trimmed:
for row in itertools.dropwhile(lambda x: datetime.strptime(x[11:16], "%H:%M") < start_time, f_log):
f_trimmed.write(row)
break
for row in itertools.takewhile(lambda x: datetime.strptime(x[11:16], "%H:%M") < end_time, f_log):
f_trimmed.write(row)
这将为您提供如下输出 trimmed.txt
:
2017-11-12 16:16:16.642 Info: Forest Extensions state changed from open to start closing because shutting down
2017-11-12 16:16:16.642 Info: Database Extensions is offline
2017-11-12 16:16:16.643 Info: Forest Extensions state changed from start closing to middle closing because shutting down
.
.
2017-11-12 16:24:07.161 Info: Deleted 1 MB at 345 MB/sec /Users/yogeshjadhav96/Library/Application Support/MarkLogic/Data/Forests/App-Services/000001db
2017-11-12 16:24:07.165 Info: Deleted 10 MB at 2361 MB/sec /Users/yogeshjadhav96/Library/Application Support/MarkLogic/Data/Forests/App-Services/000001dc
这会过滤掉不符合起始要求的行,即太早的行,然后只读取行直到结束要求。每个 row
被读入并使用 lambda 函数提取时间,将其转换为 datetime
对象并相应地与 start_time
或 end_time
进行比较。