使用 Python 正则表达式的非捕获组中的多个捕获组

Multiple capturing groups within non-capturing group using Python regexes

我有以下代码,在一个非捕获组中使用多个捕获组:

>>> regex = r'(?:a ([ac]+)|b ([bd]+))'
>>> re.match(regex, 'a caca').groups()
('caca', None)
>>> re.match(regex, 'b bdbd').groups()
(None, 'bdbd')

如何更改代码使其输出 ('caca')('bdbd')

反过来看问题:

((?:a [ac]+)|(?:b [bd]+))
^ ^         ^ ^
| |         | other exact match
| |         OR
| not capturing for exact match
capture everything

更简单的外观:https://regex101.com/r/e3bK2B/1/

你很接近。

要始终获得捕获,因为第 1 组可以使用先行进行匹配,然后使用单独的捕获组进行捕获:

(?:a (?=[ac]+)|b (?=[bd]+))(.*)

Demo

或在Python3:

>>> regex=r'(?:a (?=[ac]+)|b (?=[bd]+))(.*)'
>>> (?:a (?=[ac]+)|b (?=[bd]+))(.*)
>>> re.match(regex, 'a caca').groups()
('caca',)
>>> re.match(regex, 'b bdbd').groups()
('bdbd',)

您可以使用 branch reset group with PyPi regex module:

Alternatives inside a branch reset group share the same capturing groups. The syntax is (?|regex) where (?| opens the group and regex is any regular expression. If you don’t use any alternation or capturing groups inside the branch reset group, then its special function doesn’t come into play. It then acts as a non-capturing group.

正则表达式看起来像

(?|a ([ac]+)|b ([bd]+))

参见 regex demo. See the Python 3 demo:

import regex
rx = r'(?|a ([ac]+)|b ([bd]+))'
print (regex.search(rx, 'a caca').groups()) # => ('caca',)
print (regex.search(rx, 'b bdbd').groups()) # => ('bdbd',)

另一种选择是在没有捕获组的情况下使用 lookbehind 来获取匹配项:

(?<=a )[ac]+|(?<=b )[bd]+

Regex demo

例如

import re

pattern = r'(?<=a )[ac]+|(?<=b )[bd]+'
print (re.search(pattern, 'a caca').group())
print (re.search(pattern, 'b bdbd').group())

输出

caca
bdbd