使用正则表达式捕获 Python 脚本中的所有字符串
Capture ALL strings within a Python script with regex
这个问题的灵感来自于我尝试改编这个答案后失败的尝试:RegEx: Grabbing values between quotation marks
考虑以下 Python 脚本 (t.py
):
print("This is also an NL test")
variable = "!\n"
print('And this has an escaped quote "don\'t" in it ', variable,
"This has a single quote ' but doesn\'t end the quote as it" + \
" started with double quotes")
if "Foo Bar" != '''Another Value''':
"""
This is just nonsense
"""
aux = '?'
print("Did I \"failed\"?", f"{aux}")
我想捕获其中的所有字符串,如:
This is also an NL test
!\n
And this has an escaped quote "don\'t" in it
This has a single quote ' but doesn\'t end the quote as it
started with double quotes
Foo Bar
Another Value
This is just nonsense
?
Did I \"failed\"?
{aux}
我使用 re
模块编写了另一个 Python 脚本,从我对正则表达式的尝试中,找到大部分脚本的脚本是:
import re
pattern = re.compile(r"""(?<=(["']\b))(?:(?=(\?)).)*?(?=)""")
with open('t.py', 'r') as f:
msg = f.read()
x = pattern.finditer(msg, re.DOTALL)
for i, s in enumerate(x):
print(f'[{i}]',s.group(0))
结果如下:
[0] And this has an escaped quote "don\'t" in it
[1] This has a single quote ' but doesn\'t end the quote as it started with double quotes
[2] Foo Bar
[3] Another Value
[4] Did I \"failed\"?
为了改善我的失败,我也无法完全复制我在 regex101.com:
中找到的东西
顺便说一下,我正在使用 Python 3.6.9,我要求对正则表达式有更多的了解以破解这个。
因为你想匹配 '''
或 """
或 '
或 "
作为分隔符,所以将所有这些都放在第一组中:
('''|"""|["'])
不要在它后面加上 \b
,因为如果字符串以非单词字符开头,它就不会匹配这些字符串。
因为您想确保在引擎开始下一次迭代时 final 定界符不被视为起始定界符,所以您需要完全匹配它 (不仅仅是向前看)。
匹配除定界符以外的任何内容的中间部分可以是:
((?:\.|.)*?)
全部放在一起:
('''|"""|["'])((?:\.|.)*?)
你想要的结果将在第二个捕获组中:
pattern = re.compile(r"""(?s)('''|\"""|["'])((?:\.|.)*?)""")
with open('t.py', 'r') as f:
msg = f.read()
x = pattern.finditer(msg)
for i, s in enumerate(x):
print(f'[{i}]',s.group(2))
这个问题的灵感来自于我尝试改编这个答案后失败的尝试:RegEx: Grabbing values between quotation marks
考虑以下 Python 脚本 (t.py
):
print("This is also an NL test")
variable = "!\n"
print('And this has an escaped quote "don\'t" in it ', variable,
"This has a single quote ' but doesn\'t end the quote as it" + \
" started with double quotes")
if "Foo Bar" != '''Another Value''':
"""
This is just nonsense
"""
aux = '?'
print("Did I \"failed\"?", f"{aux}")
我想捕获其中的所有字符串,如:
This is also an NL test
!\n
And this has an escaped quote "don\'t" in it
This has a single quote ' but doesn\'t end the quote as it
started with double quotes
Foo Bar
Another Value
This is just nonsense
?
Did I \"failed\"?
{aux}
我使用 re
模块编写了另一个 Python 脚本,从我对正则表达式的尝试中,找到大部分脚本的脚本是:
import re
pattern = re.compile(r"""(?<=(["']\b))(?:(?=(\?)).)*?(?=)""")
with open('t.py', 'r') as f:
msg = f.read()
x = pattern.finditer(msg, re.DOTALL)
for i, s in enumerate(x):
print(f'[{i}]',s.group(0))
结果如下:
[0] And this has an escaped quote "don\'t" in it
[1] This has a single quote ' but doesn\'t end the quote as it started with double quotes
[2] Foo Bar
[3] Another Value
[4] Did I \"failed\"?
为了改善我的失败,我也无法完全复制我在 regex101.com:
中找到的东西顺便说一下,我正在使用 Python 3.6.9,我要求对正则表达式有更多的了解以破解这个。
因为你想匹配 '''
或 """
或 '
或 "
作为分隔符,所以将所有这些都放在第一组中:
('''|"""|["'])
不要在它后面加上 \b
,因为如果字符串以非单词字符开头,它就不会匹配这些字符串。
因为您想确保在引擎开始下一次迭代时 final 定界符不被视为起始定界符,所以您需要完全匹配它 (不仅仅是向前看)。
匹配除定界符以外的任何内容的中间部分可以是:
((?:\.|.)*?)
全部放在一起:
('''|"""|["'])((?:\.|.)*?)
你想要的结果将在第二个捕获组中:
pattern = re.compile(r"""(?s)('''|\"""|["'])((?:\.|.)*?)""")
with open('t.py', 'r') as f:
msg = f.read()
x = pattern.finditer(msg)
for i, s in enumerate(x):
print(f'[{i}]',s.group(2))