正则表达式:如何重复组区域?

Regex: How to repeat group area?

如何在不多次写入的情况下重复匹配区域的某个部分?

例如:

txt = '1. Reserve December 31, prior year.................................................................................................................. ..4,587,658,997 .......................... .1,030,275,014 .....136,963,988 .......................... .3,276,184,545 .....144,235,450 .......................... .......................... .......................... .......................... ..........................'

splitter = '^([\d.]+)(.*?)\.\s\.[\.\s]+(\(*\d[\d,.]*\)*)?[\.\s]+(\(*\d[\d,.]*\)*)?[\.\s]+(\(*\d[\d,.]*\)*)?[\.\s]+(\(*\d[\d,.]*\)*)?[\.\s]+(\(*\d[\d,.]*\)*)?[\.\s]+(\(*\d[\d,.]*\)*)?[\.\s]+(\(*\d[\d,.]*\)*)?[\.\s]+(\(*\d[\d,.]*\)*)?[\.\s]+(\(*\d[\d,.]*\)*)?[\.\s]+(\(*\d[\d,.]*\)*)?[\.\s]+(\(*\d[\d,.]*\)*)?[\.\s]+(\(*\d[\d,.]*\)*)?'

parts = re.match(splitter, x, re.DOTALL)

我的正则表达式的第一部分,^([\d.]+)(.*?)\.\s\. 获取行号和标题:

  1. Reserve December 31, prior year

之后,我必须重复此部分 12 次,以确保标题后的数字得到 12 个额外的匹配项 [\.\s]+(\(*\d[\d,.]*\)*)?

如果没有 12 个号码,则 returns None 用于该特定匹配。

有没有一种方法可以重复这个表达式 12 次而不必编写这么长的正则表达式?我试过 (?:[\.\s]+(\(*\d[\d,.]*\)*)?){12} 但没有骰子。

我认为你只需要将你的第二部分放在一个组中,将 () 放在它周围,然后将你的确切计数放在组之外,如下所示:

checkIt = re.compile(r'^([\d.]+)(.*?)\.\s\.([\.\s]+(\(*\d[\d,.]*\)*)?){12}')
if checkIt.match(text):
    do something

您还可以使用正则表达式 PyPi module 获取单独的值并循环捕获集合以获取单独的数字。

您可以在字符串末尾添加一个锚点$,并重复匹配数字 12 次。

如果没有 12 个数字,则匹配为 None。

而不是使用 (.*?),您可以使模式更具体一些,并在没有匹配项的情况下减少回溯。

^(\d+\.)\s(\w+(?:\s+[\w,]+)*)(?:[.\s]+\.(\d+(?:[.,]\d+)*)){12}+[.\s]*$

Regex demo | Python demo

例如

import regex

pattern = r"^(\d+\.)\s(\w+(?:\s+[\w,]+)*)(?:[.\s]+\.(\d+(?:[.,]\d+)*)){12}+[.\s]*$"
s = "1. Reserve December 31, prior year.................................................................................................................. ..4,587,658,997 .......................... .1,030,275,014 .....136,963,988 .......................... .3,276,184,545 .....144,235,450 .......................... .......................... .......................... .......................... ............................................................................................................................................ ..4,587,658,997 .......................... .1,030,275,014 .....136,963,988 .......................... .3,276,184,545 .....144,235,450 .......................... .......................... .......................... .......................... ............................................................................................................................................ ..4,587,658,997 .......4,587,658,997 ....."
m = regex.match(pattern, s)

if m:
    print(m.group(1))
    print(m.group(2))
    for c in m.captures(3):
        print(c)

输出

1.
Reserve December 31, prior year
4,587,658,997
1,030,275,014
136,963,988
3,276,184,545
144,235,450
4,587,658,997
1,030,275,014
136,963,988
3,276,184,545
144,235,450
4,587,658,997
4,587,658,997