有没有办法在不同的替代分支中统一正则表达式捕获组？

Question

假设我有这样的代码：

import re

rx = re.compile(r'(?:(\w{2}) (\d{2})|(\w{4}) (\d{4}))')
def get_id(s):
    g = rx.match(s).groups()
    return (
        g[0] if g[0] is not None else g[2],
        int(g[1] if g[1] is not None else g[3]),
    )

print(get_id('AA 12'))      # ('AA', 12)
print(get_id('BBBB 1234'))  # ('BBBB', 1234)

这就是我想要的，但它要求我检查每个捕获组以检查哪个捕获组实际捕获了子字符串。如果备选方案的数量很多，这可能会变得笨拙，所以我宁愿避免这种情况。

我尝试使用命名捕获，但 (?:(P<s>\w{2}) (?P<id>\d{2})|(?P<s>\w{4}) (?P<id>\d{4})) 只是引发错误。

的答案中的技巧不起作用，因为 (\w{2}(?= \d{2})|\w{4}(?= \d{4})) (\d{2}|\d{4}) 会捕获错误数量的数字，并且出于我不想深入的原因，我无法动手-优化备选方案的顺序。

有没有更地道的写法？

Answer 1

好像有！来自 re 文档：

        (?(id/name)yes|no) Matches yes pattern if the group with id/name matched,
                           the (optional) no pattern otherwise.

这使得这个例子：

rx = re.compile(r'(?P<prefix>(?P<prefix_two>)\w{2}(?= \d{2})|(?P<prefix_four>)\w{4}(?= \d{4})) (?P<digits>(?(prefix_two))\d{2}|(?(prefix_four))\d{4})')
def get_id(s):
    m = rx.match(s)
    if not m:
        return (None, None,)
    return m.group('prefix', 'digits')

print(get_id('AA 12'))      # ('AA', 12)
print(get_id('BB 1234'))    # ('BB', 12)
print(get_id('BBBB 12'))    # (None, None)
print(get_id('BBBB 1234'))  # ('BBBB', 1234)

是否值得麻烦，我会留给 reader。

有没有办法在不同的替代分支中统一正则表达式捕获组？

Is there a way to unify regex capture groups in distinct alternative branches?

python

idioms

capture-group

python-re