使用正则表达式从日志文件中提取特定文本
Extracting a particular text from a log file using regex
我有以下日志文件
2020-06-30 12:44:06,608 DEBUG [main] [apitests.ApiTest] Reading of Excel File Started
2020-06-30 12:44:11,853 DEBUG [main] [apitests.ApiTest] The Keyword's Entered : Asus Laptop
2020-06-30 12:44:11,853 DEBUG [main] [apitests.ApiTest] No of Keywords Entered = 1
2020-06-30 12:44:11,853 DEBUG [main] [apitests.ApiTest] Response Code from API : 200
2020-06-30 12:44:11,853 DEBUG [main] [apitests.ApiTest] Time Taken : 1959 milliseconds
2020-06-30 12:44:11,853 DEBUG [main] [apitests.ApiTest] The Result Obtained from API is : {"keywords": {"Asus Laptop": ["Premium grade"]}}
2020-06-30 12:44:11,853 DEBUG [main] [apitests.ApiTest] --------------------------------------------------------------------------------------
2020-06-30 12:44:12,136 DEBUG [main] [apitests.ApiTest] The Keyword's Entered : Intext Hardrive
2020-06-30 12:44:12,136 DEBUG [main] [apitests.ApiTest] No of Keywords Entered = 1
2020-06-30 12:44:12,136 DEBUG [main] [apitests.ApiTest] Response Code from API : 200
2020-06-30 12:44:12,136 DEBUG [main] [apitests.ApiTest] Time Taken : 243 milliseconds
2020-06-30 12:44:12,136 DEBUG [main] [apitests.ApiTest] The Result Obtained from API is : {"keywords": {"Intext Hardrive": ["Medium grade"]}}
2020-06-30 12:44:12,136 DEBUG [main] [apitests.ApiTest] --------------------------------------------------------------------------------------
我的目标是只提取 ["premium grade"]、["Medium grade"]...等词。基本上key值的值。
我写了下面的代码。
import re
with open('quality.log', 'r') as text_file:
text_file=text_file.read()
for line in text_file :
matches=re.findall(r"\[(.*?)\]", line)[0]
with open('qualitygrade.txt', 'w') as out:
out.write('\n'.join(matches))
目标
re.findall(r"\[(.*?)\]", line)[0]
是只提取“高级”、“中级”等
不确定我做错了什么。我的输出文本是空白的。
任何帮助请。
此 for
将覆盖 matches
每一行
for line in text_file :
matches=re.findall(r"\[(.*?)\]", line)[0]
您需要
(a) 在找到匹配项时写入输出文件
或 (b) 将匹配项存储在单独的变量中。
(b) 将类似于此
import re
matches = []
with open('quality.log', 'r') as text_file:
text_file=text_file.read()
for line in text_file :
matches += re.findall(r"\[.*?\]", line)
with open('qualitygrade.txt', 'w') as out:
out.write('\n'.join(matches))
您还需要修复您的正则表达式,因为您当前使用的正则表达式也会在您的日志中捕获一些其他标记。
您不需要 for 循环,因为您是一次读取整个文件。
您的代码可以是:
with open('quality.log', 'r') as text_file:
text_file=text_file.read()
matches = re.findall(r'\["(.*?)"]', text_file)
如果要获取双引号之间的值,应将它们添加到模式中。
\["(.*?)"]
输出
Premium grade
Medium grade
我有以下日志文件
2020-06-30 12:44:06,608 DEBUG [main] [apitests.ApiTest] Reading of Excel File Started
2020-06-30 12:44:11,853 DEBUG [main] [apitests.ApiTest] The Keyword's Entered : Asus Laptop
2020-06-30 12:44:11,853 DEBUG [main] [apitests.ApiTest] No of Keywords Entered = 1
2020-06-30 12:44:11,853 DEBUG [main] [apitests.ApiTest] Response Code from API : 200
2020-06-30 12:44:11,853 DEBUG [main] [apitests.ApiTest] Time Taken : 1959 milliseconds
2020-06-30 12:44:11,853 DEBUG [main] [apitests.ApiTest] The Result Obtained from API is : {"keywords": {"Asus Laptop": ["Premium grade"]}}
2020-06-30 12:44:11,853 DEBUG [main] [apitests.ApiTest] --------------------------------------------------------------------------------------
2020-06-30 12:44:12,136 DEBUG [main] [apitests.ApiTest] The Keyword's Entered : Intext Hardrive
2020-06-30 12:44:12,136 DEBUG [main] [apitests.ApiTest] No of Keywords Entered = 1
2020-06-30 12:44:12,136 DEBUG [main] [apitests.ApiTest] Response Code from API : 200
2020-06-30 12:44:12,136 DEBUG [main] [apitests.ApiTest] Time Taken : 243 milliseconds
2020-06-30 12:44:12,136 DEBUG [main] [apitests.ApiTest] The Result Obtained from API is : {"keywords": {"Intext Hardrive": ["Medium grade"]}}
2020-06-30 12:44:12,136 DEBUG [main] [apitests.ApiTest] --------------------------------------------------------------------------------------
我的目标是只提取 ["premium grade"]、["Medium grade"]...等词。基本上key值的值。
我写了下面的代码。
import re
with open('quality.log', 'r') as text_file:
text_file=text_file.read()
for line in text_file :
matches=re.findall(r"\[(.*?)\]", line)[0]
with open('qualitygrade.txt', 'w') as out:
out.write('\n'.join(matches))
目标
re.findall(r"\[(.*?)\]", line)[0]
是只提取“高级”、“中级”等
不确定我做错了什么。我的输出文本是空白的。 任何帮助请。
此 for
将覆盖 matches
每一行
for line in text_file :
matches=re.findall(r"\[(.*?)\]", line)[0]
您需要 (a) 在找到匹配项时写入输出文件 或 (b) 将匹配项存储在单独的变量中。 (b) 将类似于此
import re
matches = []
with open('quality.log', 'r') as text_file:
text_file=text_file.read()
for line in text_file :
matches += re.findall(r"\[.*?\]", line)
with open('qualitygrade.txt', 'w') as out:
out.write('\n'.join(matches))
您还需要修复您的正则表达式,因为您当前使用的正则表达式也会在您的日志中捕获一些其他标记。
您不需要 for 循环,因为您是一次读取整个文件。
您的代码可以是:
with open('quality.log', 'r') as text_file:
text_file=text_file.read()
matches = re.findall(r'\["(.*?)"]', text_file)
如果要获取双引号之间的值,应将它们添加到模式中。
\["(.*?)"]
输出
Premium grade
Medium grade