使用正则表达式解析日志 Python

Question

编码和Python轻量级:)

我必须遍历一些日志文件并找出显示错误的日志文件。繁荣完成了。我要做的是弄清楚如何获取包含错误详细信息的以下 10 行。我想它一定是 if 语句和 for/while 循环的某种组合。任何帮助将不胜感激。

import os
import re

# Regex used to match
line_regex = re.compile(r"ERROR")

# Output file, where the matched loglines will be copied to
output_filename = os.path.normpath("NodeOut.log")
# Overwrites the file, ensure we're starting out with a blank file
#TODO Append this later
with open(output_filename, "w") as out_file:
    out_file.write("")

# Open output file in 'append' mode
with open(output_filename, "a") as out_file:
    # Open input file in 'read' mode
    with open("MXNode1.stdout", "r") as in_file:
        # Loop over each log line
        for line in in_file:
            # If log line matches our regex, print remove later, and write > file
            if (line_regex.search(line)):
                # for i in range():
                print(line)
                out_file.write(line)

Answer 1

假设您只想始终抓取接下来的 10 行，那么您可以执行类似于以下操作的操作：

    with open("MXNode1.stdout", "r") as in_file:
        # Loop over each log line
         lineCount = 11
         for line in in_file:
             # If log line matches our regex, print remove later, and write > file
             if (line_regex.search(line)):
                 # for i in range():
                 print(line)
                 lineCount = 0
             if (lineCount < 11):
                 lineCount += 1
                 out_file.write(line)

第二个 if 语句将帮助您始终抢占先机。神奇的数字 11 使您可以在找到 ERROR 的初始行之后获取接下来的 10 行。

Answer 2

不需要正则表达式来执行此操作，您只需使用 in 运算符 ("ERROR" in line).

此外，要在 w 模式下不打开文件而清除文件内容，只需将光标放在文件开头并截断即可。

import os

output_filename = os.path.normpath("NodeOut.log")

with open(output_filename, 'a') as out_file:
    out_file.seek(0, 0)
    out_file.truncate(0)

    with open("MXNode1.stdout", 'r') as in_file:
        line = in_file.readline()
        while line:
            if "ERROR" in line:
                out_file.write(line)
                for i in range(10):
                    out_file.write(in_file.readline())
            line = in_file.readline()

我们使用 while 循环逐行读取 in_file.readline()。优点是您可以使用 for 循环轻松读取下一行。

参见doc：

f.readline() reads a single line from the file; a newline character (\n) is left at the end of the string, and is only omitted on the last line of the file if the file doesn’t end in a newline. This makes the return value unambiguous; if f.readline() returns an empty string, the end of the file has been reached, while a blank line is represented by '\n', a string containing only a single newline.

使用正则表达式解析日志 Python

Parsing Logs with Regular Expressions Python

python

regex

logging