如何使用正则表达式获取嵌套组
How to get nested-groups with regexp
我需要你帮助我遵循正则表达式。
我有一条短信
"[Hello|Hi]. We are [inviting | calling] you at position [[junior| mid junior]|senior] developer."
使用正则表达式我想得到
[Hello|Hi]
[inviting | calling]
[[junior| mid junior]|senior]
以下 rexeg (\[[^\[$\]\]]*\])
给我
[Hello|Hi]
[inviting | calling]
[junior| mid junior]
那么我应该如何修复它以获得正确的输出?
让我们定义您的字符串并导入 re:
>>> s = "[Hello|Hi]. We are [inviting | calling] you at position [[junior| mid junior]|senior] developer."
>>> import re
现在,尝试:
>>> re.findall(r'\[ (?:[^][]* \[ [^][]* \])* [^][]* \]', s, re.X)
['[Hello|Hi]', '[inviting | calling]', '[[junior| mid junior]|senior]']
更详细
考虑这个脚本:
$ cat script.py
import re
s = "[Hello|Hi]. We are [inviting | calling] you at position [[junior| mid junior]|senior] developer."
matches = re.findall(r'''\[ # Opening bracket
(?:[^][]* \[ [^][]* \])* # Zero or more non-bracket characters followed by a [, followed by zero or more non-bracket characters, followed by a ]
[^][]* # Zero or more non-bracket characters
\] # Closing bracket
''',
s,
re.X)
print('\n'.join(matches))
这会产生输出:
$ python script.py
[Hello|Hi]
[inviting | calling]
[[junior| mid junior]|senior]
您可以将以下代码与 PyPi regex module 和类似 PCRE 的 r'\[(?:[^][]++|(?R))*]'
正则表达式一起使用:
>>> import regex
>>> s = "[Hello|Hi]. We are [inviting | calling] you at position [[junior| mid junior]|senior] developer."
>>> r = regex.compile(r'\[(?:[^][]++|(?R))*]')
>>> print(r.findall(s))
['[Hello|Hi]', '[inviting | calling]', '[[junior| mid junior]|senior]']
>>>
参见regex demo。
\[(?:[^][]++|(?R))*]
匹配 [
,然后是 ]
和 [
以外的零个或多个 1+ 字符序列或整个括号表达式 [...]
,然后是结束符 ]
.
您可以使用简单的 stack
代替 recursive regex
x="[Hello|Hi]. We are [inviting | calling] you at position [[junior| mid junior]|senior] developer.[sd[sd[sd][sd]]]"
l=[]
st=[]
start=None
for i,j in enumerate(x):
if j=='[':
if j not in st:
start = i
st.append(j)
elif j==']':
st.pop()
if not st:
l.append(x[start:i+1])
print l
输出:['[Hello|Hi]', '[inviting | calling]', '[[junior| mid junior]|senior]', '[sd[sd[sd][sd]]]']
我需要你帮助我遵循正则表达式。 我有一条短信
"[Hello|Hi]. We are [inviting | calling] you at position [[junior| mid junior]|senior] developer."
使用正则表达式我想得到
[Hello|Hi]
[inviting | calling]
[[junior| mid junior]|senior]
以下 rexeg (\[[^\[$\]\]]*\])
给我
[Hello|Hi]
[inviting | calling]
[junior| mid junior]
那么我应该如何修复它以获得正确的输出?
让我们定义您的字符串并导入 re:
>>> s = "[Hello|Hi]. We are [inviting | calling] you at position [[junior| mid junior]|senior] developer."
>>> import re
现在,尝试:
>>> re.findall(r'\[ (?:[^][]* \[ [^][]* \])* [^][]* \]', s, re.X)
['[Hello|Hi]', '[inviting | calling]', '[[junior| mid junior]|senior]']
更详细
考虑这个脚本:
$ cat script.py
import re
s = "[Hello|Hi]. We are [inviting | calling] you at position [[junior| mid junior]|senior] developer."
matches = re.findall(r'''\[ # Opening bracket
(?:[^][]* \[ [^][]* \])* # Zero or more non-bracket characters followed by a [, followed by zero or more non-bracket characters, followed by a ]
[^][]* # Zero or more non-bracket characters
\] # Closing bracket
''',
s,
re.X)
print('\n'.join(matches))
这会产生输出:
$ python script.py
[Hello|Hi]
[inviting | calling]
[[junior| mid junior]|senior]
您可以将以下代码与 PyPi regex module 和类似 PCRE 的 r'\[(?:[^][]++|(?R))*]'
正则表达式一起使用:
>>> import regex
>>> s = "[Hello|Hi]. We are [inviting | calling] you at position [[junior| mid junior]|senior] developer."
>>> r = regex.compile(r'\[(?:[^][]++|(?R))*]')
>>> print(r.findall(s))
['[Hello|Hi]', '[inviting | calling]', '[[junior| mid junior]|senior]']
>>>
参见regex demo。
\[(?:[^][]++|(?R))*]
匹配 [
,然后是 ]
和 [
以外的零个或多个 1+ 字符序列或整个括号表达式 [...]
,然后是结束符 ]
.
您可以使用简单的 stack
代替 recursive regex
x="[Hello|Hi]. We are [inviting | calling] you at position [[junior| mid junior]|senior] developer.[sd[sd[sd][sd]]]"
l=[]
st=[]
start=None
for i,j in enumerate(x):
if j=='[':
if j not in st:
start = i
st.append(j)
elif j==']':
st.pop()
if not st:
l.append(x[start:i+1])
print l
输出:['[Hello|Hi]', '[inviting | calling]', '[[junior| mid junior]|senior]', '[sd[sd[sd][sd]]]']