Python: 如何使用正则表达式查找重复的字符串
Python: How to use regex to find a repetitive string
当在数据块中找到关键字时,我想提取/输出一些数据。如何使用正则表达式检索从第一个“#”到最后一个“)”的所有数据?
//Log_1.txt
# DON'T WANT #
{12345.54321}
[Tues Jul 2 01:23:45 2019]
< SOME_TYPE
(some_ID = [12345] reportChange::someMoreInfo called with invalid some ID)
# DON'T WANT #
{12345.54321}
[Tues Jul 2 01:23:45 2019]
< SOME_TYPE
(some_ID = [12345] failed::someMoreInfo called with invalid some ID)
代码
import re
with open("Log_1.txt", 'r') as f:
result = re.search('#(.*)#', f.read())
print(result.group(0))
这不是我的全部代码,但如果关键字是 "reportChange",输出应该是 >>>
# DON'T WANT #
.
.
.
(some_ID = [12345] reportChange::someMoreInfo called with invalid some ID)
而不是
# DON'T WANT #
假设您想要最新的 # DON'T WANT #
,您可以使用正则表达式 #(.*)#[^)]+yourKeyWordHere[^)]+\)
。在 python 中,您可以使用字符串格式,并用 {}
代替关键字以替换为您想要的任何单词。
import re
keyword='reportChange'
with open("Log_1.txt", 'r') as f:
result = re.search('#(.*)#[^)]+{}[^)]+\)'.format(keyword), f.read())
print(result.group(0))
作为正则表达式,您必须使用否定前瞻和否定回顾。
试试这个:(?!#).*(?<![)])
作为正则表达式。它应该输出 # 和 ).
之间的所有内容
未来:使用 regex101.com 测试您的正则表达式。
此代码仅打印包含 reportChange::someMoreInfo called with invalid some ID
:
的数据块
data = '''//Log_1.txt
# DON'T WANT #
{12345.54321}
[Tues Jul 2 01:23:45 2019]
< SOME_TYPE
(some_ID = [12345] reportChange::someMoreInfo called with invalid some ID)
# DON'T WANT #
{12345.54321}
[Tues Jul 2 01:23:45 2019]
< SOME_TYPE
(some_ID = [12345] failed::someMoreInfo called with invalid some ID)
# DON'T WANT #
{12345.54321}
[Tues Jul 2 01:23:45 2019]
< SOME_TYPE
(some_ID = [12345xxx] reportChange::someMoreInfo called with invalid some ID)
'''
import re
for d in re.split(r'\n\n', data):
g = re.findall(r'^# DON\'T WANT #.*reportChange::someMoreInfo called with invalid some ID\)$', d, flags=re.M|re.DOTALL)
if g:
print(g[0])
print()
打印:
# DON'T WANT #
{12345.54321}
[Tues Jul 2 01:23:45 2019]
< SOME_TYPE
(some_ID = [12345] reportChange::someMoreInfo called with invalid some ID)
# DON'T WANT #
{12345.54321}
[Tues Jul 2 01:23:45 2019]
< SOME_TYPE
(some_ID = [12345xxx] reportChange::someMoreInfo called with invalid some ID)
当在数据块中找到关键字时,我想提取/输出一些数据。如何使用正则表达式检索从第一个“#”到最后一个“)”的所有数据?
//Log_1.txt
# DON'T WANT #
{12345.54321}
[Tues Jul 2 01:23:45 2019]
< SOME_TYPE
(some_ID = [12345] reportChange::someMoreInfo called with invalid some ID)
# DON'T WANT #
{12345.54321}
[Tues Jul 2 01:23:45 2019]
< SOME_TYPE
(some_ID = [12345] failed::someMoreInfo called with invalid some ID)
代码
import re
with open("Log_1.txt", 'r') as f:
result = re.search('#(.*)#', f.read())
print(result.group(0))
这不是我的全部代码,但如果关键字是 "reportChange",输出应该是 >>>
# DON'T WANT #
.
.
.
(some_ID = [12345] reportChange::someMoreInfo called with invalid some ID)
而不是
# DON'T WANT #
假设您想要最新的 # DON'T WANT #
,您可以使用正则表达式 #(.*)#[^)]+yourKeyWordHere[^)]+\)
。在 python 中,您可以使用字符串格式,并用 {}
代替关键字以替换为您想要的任何单词。
import re
keyword='reportChange'
with open("Log_1.txt", 'r') as f:
result = re.search('#(.*)#[^)]+{}[^)]+\)'.format(keyword), f.read())
print(result.group(0))
作为正则表达式,您必须使用否定前瞻和否定回顾。
试试这个:(?!#).*(?<![)])
作为正则表达式。它应该输出 # 和 ).
未来:使用 regex101.com 测试您的正则表达式。
此代码仅打印包含 reportChange::someMoreInfo called with invalid some ID
:
data = '''//Log_1.txt
# DON'T WANT #
{12345.54321}
[Tues Jul 2 01:23:45 2019]
< SOME_TYPE
(some_ID = [12345] reportChange::someMoreInfo called with invalid some ID)
# DON'T WANT #
{12345.54321}
[Tues Jul 2 01:23:45 2019]
< SOME_TYPE
(some_ID = [12345] failed::someMoreInfo called with invalid some ID)
# DON'T WANT #
{12345.54321}
[Tues Jul 2 01:23:45 2019]
< SOME_TYPE
(some_ID = [12345xxx] reportChange::someMoreInfo called with invalid some ID)
'''
import re
for d in re.split(r'\n\n', data):
g = re.findall(r'^# DON\'T WANT #.*reportChange::someMoreInfo called with invalid some ID\)$', d, flags=re.M|re.DOTALL)
if g:
print(g[0])
print()
打印:
# DON'T WANT #
{12345.54321}
[Tues Jul 2 01:23:45 2019]
< SOME_TYPE
(some_ID = [12345] reportChange::someMoreInfo called with invalid some ID)
# DON'T WANT #
{12345.54321}
[Tues Jul 2 01:23:45 2019]
< SOME_TYPE
(some_ID = [12345xxx] reportChange::someMoreInfo called with invalid some ID)