Python 从日志文件打印正则表达式组的问题
Python issue with printing regex group from log file
我无法从日志文件中打印两个正则表达式组。我没有得到任何错误,我只是没有得到任何结果。
我希望它们读作:
12345@email.com = 19290
45678@email.com = 23625
在这种情况下,我只想打印类别 2 中的帐户和高分数据。我是 Python 的新手,但我正在尝试通过实践学习更多信息。似乎我的正则表达式没有返回 python 中的任何匹配项,但是当我使用这个 Regex101 工具时,我得到了这两个组和我的正则表达式代码。也许问题是我如何打印组。
任何帮助将不胜感激,以便我可以从错误中吸取教训。 :)
这是我的代码:
import re
log = open(r"C:\CurrentLog.txt","r")
regex = re.compile("Category2-{25}\n.{51}(?P<Account>.{11}\.com).\.\.(?:$\n^.*){5}High Score = (?P<Score>\d{2,})", re.M)
for line in log:
data = regex.findall(line)
for word in data:
print (line.group(Account))
print (line.group(Score))
日志文件示例:
实际日志文件将保持在 400 - 600 行左右,所以我不担心将其加载到内存中。
2019-10-17 17:56:44,295 :: INFO :: root :: -------------------------Category1-------------------------
2019-10-17 17:56:49,988 :: INFO :: root :: Account 12345@email.com...
2019-10-17 17:57:09,328 :: INFO :: root :: other info 1
2019-10-17 18:00:22,267 :: INFO :: root :: other info 2
2019-10-17 18:00:22,582 :: INFO :: root :: High Score = 19090
2019-10-17 18:00:22,582 :: INFO :: root :: other info 3
2019-10-17 18:00:22,582 :: INFO :: root :: other info 4
2019-10-17 18:00:24,661 :: INFO :: root :: -------------------------Category2-------------------------
2019-10-17 18:00:29,619 :: INFO :: root :: Account 12345@email.com...
2019-10-17 18:00:46,317 :: INFO :: root :: other info 1
2019-10-17 18:05:46,088 :: INFO :: root :: other info 2
2019-10-17 18:05:52,451 :: INFO :: root :: other info 3
2019-10-17 18:08:11,765 :: INFO :: root :: other info 4
2019-10-17 18:08:12,813 :: INFO :: root :: High Score = 19290
2019-10-17 18:08:12,814 :: INFO :: root :: other info 5
2019-10-17 18:08:12,814 :: INFO :: root :: other info 6
2019-10-17 18:08:14,890 :: INFO :: root :: -------------------------Category1-------------------------
2019-10-17 18:08:19,860 :: INFO :: root :: Account 45678@email.com...
2019-10-17 18:08:37,188 :: INFO :: root :: other info 1
2019-10-17 18:13:23,232 :: INFO :: root :: other info 2
2019-10-17 18:13:23,595 :: INFO :: root :: High Score = 23425
2019-10-17 18:13:23,595 :: INFO :: root :: other info 3
2019-10-17 18:13:23,595 :: INFO :: root :: other info 4
2019-10-17 18:13:25,689 :: INFO :: root :: -------------------------Category2-------------------------
2019-10-17 18:13:30,660 :: INFO :: root :: Account 45678@email.com...
2019-10-17 18:13:47,727 :: INFO :: root :: other info 1
2019-10-17 18:16:20,327 :: INFO :: root :: other info 2
2019-10-17 18:16:26,907 :: INFO :: root :: other info 3
2019-10-17 18:18:44,376 :: INFO :: root :: other info 4
2019-10-17 18:18:45,447 :: INFO :: root :: High Score = 23625
2019-10-17 18:18:45,447 :: INFO :: root :: other info 5
2019-10-17 18:18:45,447 :: INFO :: root :: other info 6
如果您需要更多信息或上下文,请告诉我。
谢谢!
for line in log:
data = regex.findall(line)
上面的代码块正在做的是在每一行上应用你的正则表达式,这将失败,因为你的正则表达式跨越多行。您需要对整个内容使用正则表达式。
下面的代码应该可以正常工作
import re
# Read the entire content from file into a variable
contents = open(r"log.txt", "r").read()
regex = re.compile("Category2-{25}\n.{51}(?P<Account>.{11}\.com).\.\.(?:$\n^.*){5}High Score = (?P<Score>\d{2,})", re.M)
# Find iter is like re.findall, just that it returns the captured regex group objects(Also that it returns a callable iterator, but thats not important to know here)
for match in regex.finditer(contents):
print match.group('Account')
print match.group('Score')
我觉得你把 Regex
复杂化了一点试试这个:
RE_PATTERN = re.compile(r'Account\s(?P<Account>.+?\.com).*?High Score = (?P<Score>\d+)', re.DOTALL)
# read the entire the log as a text
for match in RE_PATTERN.finditer(log.read()):
print(match.group('Account'))
print(match.group('Score'))
使用 re.DOTALL
,.
将匹配 \n
,因此 .*?
将消耗任何内容,直到找到单词 High Score =
。
您可以尝试简化版的正则表达式:Category2-{25}\n.+Account\s+(.+)[\s\S]+?High Score = (.+)
Account\s+(.+)
- 将匹配 Account
和一个或多个空格,因此它将匹配直到电子邮件地址,然后将匹配所有内容直到换行符(即整个电子邮件地址)并存储它在捕获组中。
另一个修改是[\s\S]+?
,它匹配每个字符,一个或多个,非贪婪,直到匹配High Score
。然后它在第二个捕获组中匹配并存储分数(在等号之后)。
下面的代码可以帮到你。我会给你一个包含电子邮件和分数的元组列表。
log_text = open(r"log.txt", "r").read()
regex = re.compile(r"Category2-{25}\n.{51}(?P<Account>.{11}\.com).\.\.(?:$\n^.*){5}High Score = (?P<Score>\d{2,})", re.M)
print(regex.findall(log_text))
输出
[('12345@email.com', '19290'), ('45678@email.com', '23625')]
我无法从日志文件中打印两个正则表达式组。我没有得到任何错误,我只是没有得到任何结果。
我希望它们读作:
12345@email.com = 19290
45678@email.com = 23625
在这种情况下,我只想打印类别 2 中的帐户和高分数据。我是 Python 的新手,但我正在尝试通过实践学习更多信息。似乎我的正则表达式没有返回 python 中的任何匹配项,但是当我使用这个 Regex101 工具时,我得到了这两个组和我的正则表达式代码。也许问题是我如何打印组。 任何帮助将不胜感激,以便我可以从错误中吸取教训。 :)
这是我的代码:
import re
log = open(r"C:\CurrentLog.txt","r")
regex = re.compile("Category2-{25}\n.{51}(?P<Account>.{11}\.com).\.\.(?:$\n^.*){5}High Score = (?P<Score>\d{2,})", re.M)
for line in log:
data = regex.findall(line)
for word in data:
print (line.group(Account))
print (line.group(Score))
日志文件示例:
实际日志文件将保持在 400 - 600 行左右,所以我不担心将其加载到内存中。
2019-10-17 17:56:44,295 :: INFO :: root :: -------------------------Category1-------------------------
2019-10-17 17:56:49,988 :: INFO :: root :: Account 12345@email.com...
2019-10-17 17:57:09,328 :: INFO :: root :: other info 1
2019-10-17 18:00:22,267 :: INFO :: root :: other info 2
2019-10-17 18:00:22,582 :: INFO :: root :: High Score = 19090
2019-10-17 18:00:22,582 :: INFO :: root :: other info 3
2019-10-17 18:00:22,582 :: INFO :: root :: other info 4
2019-10-17 18:00:24,661 :: INFO :: root :: -------------------------Category2-------------------------
2019-10-17 18:00:29,619 :: INFO :: root :: Account 12345@email.com...
2019-10-17 18:00:46,317 :: INFO :: root :: other info 1
2019-10-17 18:05:46,088 :: INFO :: root :: other info 2
2019-10-17 18:05:52,451 :: INFO :: root :: other info 3
2019-10-17 18:08:11,765 :: INFO :: root :: other info 4
2019-10-17 18:08:12,813 :: INFO :: root :: High Score = 19290
2019-10-17 18:08:12,814 :: INFO :: root :: other info 5
2019-10-17 18:08:12,814 :: INFO :: root :: other info 6
2019-10-17 18:08:14,890 :: INFO :: root :: -------------------------Category1-------------------------
2019-10-17 18:08:19,860 :: INFO :: root :: Account 45678@email.com...
2019-10-17 18:08:37,188 :: INFO :: root :: other info 1
2019-10-17 18:13:23,232 :: INFO :: root :: other info 2
2019-10-17 18:13:23,595 :: INFO :: root :: High Score = 23425
2019-10-17 18:13:23,595 :: INFO :: root :: other info 3
2019-10-17 18:13:23,595 :: INFO :: root :: other info 4
2019-10-17 18:13:25,689 :: INFO :: root :: -------------------------Category2-------------------------
2019-10-17 18:13:30,660 :: INFO :: root :: Account 45678@email.com...
2019-10-17 18:13:47,727 :: INFO :: root :: other info 1
2019-10-17 18:16:20,327 :: INFO :: root :: other info 2
2019-10-17 18:16:26,907 :: INFO :: root :: other info 3
2019-10-17 18:18:44,376 :: INFO :: root :: other info 4
2019-10-17 18:18:45,447 :: INFO :: root :: High Score = 23625
2019-10-17 18:18:45,447 :: INFO :: root :: other info 5
2019-10-17 18:18:45,447 :: INFO :: root :: other info 6
如果您需要更多信息或上下文,请告诉我。
谢谢!
for line in log:
data = regex.findall(line)
上面的代码块正在做的是在每一行上应用你的正则表达式,这将失败,因为你的正则表达式跨越多行。您需要对整个内容使用正则表达式。
下面的代码应该可以正常工作
import re
# Read the entire content from file into a variable
contents = open(r"log.txt", "r").read()
regex = re.compile("Category2-{25}\n.{51}(?P<Account>.{11}\.com).\.\.(?:$\n^.*){5}High Score = (?P<Score>\d{2,})", re.M)
# Find iter is like re.findall, just that it returns the captured regex group objects(Also that it returns a callable iterator, but thats not important to know here)
for match in regex.finditer(contents):
print match.group('Account')
print match.group('Score')
我觉得你把 Regex
复杂化了一点试试这个:
RE_PATTERN = re.compile(r'Account\s(?P<Account>.+?\.com).*?High Score = (?P<Score>\d+)', re.DOTALL)
# read the entire the log as a text
for match in RE_PATTERN.finditer(log.read()):
print(match.group('Account'))
print(match.group('Score'))
使用 re.DOTALL
,.
将匹配 \n
,因此 .*?
将消耗任何内容,直到找到单词 High Score =
。
您可以尝试简化版的正则表达式:Category2-{25}\n.+Account\s+(.+)[\s\S]+?High Score = (.+)
Account\s+(.+)
- 将匹配 Account
和一个或多个空格,因此它将匹配直到电子邮件地址,然后将匹配所有内容直到换行符(即整个电子邮件地址)并存储它在捕获组中。
另一个修改是[\s\S]+?
,它匹配每个字符,一个或多个,非贪婪,直到匹配High Score
。然后它在第二个捕获组中匹配并存储分数(在等号之后)。
下面的代码可以帮到你。我会给你一个包含电子邮件和分数的元组列表。
log_text = open(r"log.txt", "r").read()
regex = re.compile(r"Category2-{25}\n.{51}(?P<Account>.{11}\.com).\.\.(?:$\n^.*){5}High Score = (?P<Score>\d{2,})", re.M)
print(regex.findall(log_text))
输出
[('12345@email.com', '19290'), ('45678@email.com', '23625')]