Python 从日志文件打印正则表达式组的问题

Question

我无法从日志文件中打印两个正则表达式组。我没有得到任何错误，我只是没有得到任何结果。

我希望它们读作：

12345@email.com = 19290 45678@email.com = 23625

在这种情况下，我只想打印类别 2 中的帐户和高分数据。我是 Python 的新手，但我正在尝试通过实践学习更多信息。似乎我的正则表达式没有返回 python 中的任何匹配项，但是当我使用这个 Regex101 工具时，我得到了这两个组和我的正则表达式代码。也许问题是我如何打印组。任何帮助将不胜感激，以便我可以从错误中吸取教训。 :)

这是我的代码：

import re

log = open(r"C:\CurrentLog.txt","r")
regex = re.compile("Category2-{25}\n.{51}(?P<Account>.{11}\.com).\.\.(?:$\n^.*){5}High Score = (?P<Score>\d{2,})", re.M)

for line in log:
    data = regex.findall(line)
    for word in data:
        print (line.group(Account))
        print (line.group(Score))

日志文件示例：

实际日志文件将保持在 400 - 600 行左右，所以我不担心将其加载到内存中。

2019-10-17 17:56:44,295 :: INFO :: root :: -------------------------Category1-------------------------
2019-10-17 17:56:49,988 :: INFO :: root :: Account 12345@email.com...
2019-10-17 17:57:09,328 :: INFO :: root :: other info 1
2019-10-17 18:00:22,267 :: INFO :: root :: other info 2
2019-10-17 18:00:22,582 :: INFO :: root :: High Score = 19090
2019-10-17 18:00:22,582 :: INFO :: root :: other info 3
2019-10-17 18:00:22,582 :: INFO :: root :: other info 4
2019-10-17 18:00:24,661 :: INFO :: root :: -------------------------Category2-------------------------
2019-10-17 18:00:29,619 :: INFO :: root :: Account 12345@email.com...
2019-10-17 18:00:46,317 :: INFO :: root :: other info 1
2019-10-17 18:05:46,088 :: INFO :: root :: other info 2
2019-10-17 18:05:52,451 :: INFO :: root :: other info 3
2019-10-17 18:08:11,765 :: INFO :: root :: other info 4
2019-10-17 18:08:12,813 :: INFO :: root :: High Score = 19290
2019-10-17 18:08:12,814 :: INFO :: root :: other info 5
2019-10-17 18:08:12,814 :: INFO :: root :: other info 6
2019-10-17 18:08:14,890 :: INFO :: root :: -------------------------Category1-------------------------
2019-10-17 18:08:19,860 :: INFO :: root :: Account 45678@email.com...
2019-10-17 18:08:37,188 :: INFO :: root :: other info 1
2019-10-17 18:13:23,232 :: INFO :: root :: other info 2
2019-10-17 18:13:23,595 :: INFO :: root :: High Score = 23425
2019-10-17 18:13:23,595 :: INFO :: root :: other info 3
2019-10-17 18:13:23,595 :: INFO :: root :: other info 4
2019-10-17 18:13:25,689 :: INFO :: root :: -------------------------Category2-------------------------
2019-10-17 18:13:30,660 :: INFO :: root :: Account 45678@email.com...
2019-10-17 18:13:47,727 :: INFO :: root :: other info 1
2019-10-17 18:16:20,327 :: INFO :: root :: other info 2
2019-10-17 18:16:26,907 :: INFO :: root :: other info 3
2019-10-17 18:18:44,376 :: INFO :: root :: other info 4
2019-10-17 18:18:45,447 :: INFO :: root :: High Score = 23625
2019-10-17 18:18:45,447 :: INFO :: root :: other info 5
2019-10-17 18:18:45,447 :: INFO :: root :: other info 6

如果您需要更多信息或上下文，请告诉我。

谢谢！

Answer 1

for line in log:
    data = regex.findall(line)

上面的代码块正在做的是在每一行上应用你的正则表达式，这将失败，因为你的正则表达式跨越多行。您需要对整个内容使用正则表达式。

下面的代码应该可以正常工作

import re
# Read the entire content from file into a variable
contents = open(r"log.txt", "r").read()
regex = re.compile("Category2-{25}\n.{51}(?P<Account>.{11}\.com).\.\.(?:$\n^.*){5}High Score = (?P<Score>\d{2,})", re.M)

# Find iter is like re.findall, just that it returns the captured regex group objects(Also that it returns a callable iterator, but thats not important to know here)
for match in regex.finditer(contents):
    print match.group('Account')
    print match.group('Score')

Answer 2

我觉得你把 Regex 复杂化了一点试试这个：

RE_PATTERN = re.compile(r'Account\s(?P<Account>.+?\.com).*?High Score = (?P<Score>\d+)', re.DOTALL)

#  read the entire the log as a text 
for match in RE_PATTERN.finditer(log.read()):
    print(match.group('Account'))
    print(match.group('Score'))

使用 re.DOTALL，. 将匹配 \n，因此 .*? 将消耗任何内容，直到找到单词 High Score =。

Answer 3

您可以尝试简化版的正则表达式：Category2-{25}\n.+Account\s+(.+)[\s\S]+?High Score = (.+)

Account\s+(.+) - 将匹配 Account 和一个或多个空格，因此它将匹配直到电子邮件地址，然后将匹配所有内容直到换行符（即整个电子邮件地址）并存储它在捕获组中。

另一个修改是[\s\S]+?，它匹配每个字符，一个或多个，非贪婪，直到匹配High Score。然后它在第二个捕获组中匹配并存储分数（在等号之后）。

Demo

Answer 4

下面的代码可以帮到你。我会给你一个包含电子邮件和分数的元组列表。

log_text = open(r"log.txt", "r").read()
regex = re.compile(r"Category2-{25}\n.{51}(?P<Account>.{11}\.com).\.\.(?:$\n^.*){5}High Score = (?P<Score>\d{2,})", re.M)
print(regex.findall(log_text))

输出

[('12345@email.com', '19290'), ('45678@email.com', '23625')]

Python 从日志文件打印正则表达式组的问题

Python issue with printing regex group from log file

python

regex

regex-group

python-3.x