正则表达式:如何重复组区域?
Regex: How to repeat group area?
如何在不多次写入的情况下重复匹配区域的某个部分?
例如:
txt = '1. Reserve December 31, prior year.................................................................................................................. ..4,587,658,997 .......................... .1,030,275,014 .....136,963,988 .......................... .3,276,184,545 .....144,235,450 .......................... .......................... .......................... .......................... ..........................'
splitter = '^([\d.]+)(.*?)\.\s\.[\.\s]+(\(*\d[\d,.]*\)*)?[\.\s]+(\(*\d[\d,.]*\)*)?[\.\s]+(\(*\d[\d,.]*\)*)?[\.\s]+(\(*\d[\d,.]*\)*)?[\.\s]+(\(*\d[\d,.]*\)*)?[\.\s]+(\(*\d[\d,.]*\)*)?[\.\s]+(\(*\d[\d,.]*\)*)?[\.\s]+(\(*\d[\d,.]*\)*)?[\.\s]+(\(*\d[\d,.]*\)*)?[\.\s]+(\(*\d[\d,.]*\)*)?[\.\s]+(\(*\d[\d,.]*\)*)?[\.\s]+(\(*\d[\d,.]*\)*)?'
parts = re.match(splitter, x, re.DOTALL)
我的正则表达式的第一部分,^([\d.]+)(.*?)\.\s\.
获取行号和标题:
- Reserve December 31, prior year
之后,我必须重复此部分 12 次,以确保标题后的数字得到 12 个额外的匹配项 [\.\s]+(\(*\d[\d,.]*\)*)?
。
如果没有 12 个号码,则 returns None
用于该特定匹配。
有没有一种方法可以重复这个表达式 12 次而不必编写这么长的正则表达式?我试过 (?:[\.\s]+(\(*\d[\d,.]*\)*)?){12}
但没有骰子。
我认为你只需要将你的第二部分放在一个组中,将 () 放在它周围,然后将你的确切计数放在组之外,如下所示:
checkIt = re.compile(r'^([\d.]+)(.*?)\.\s\.([\.\s]+(\(*\d[\d,.]*\)*)?){12}')
if checkIt.match(text):
do something
您还可以使用正则表达式 PyPi module 获取单独的值并循环捕获集合以获取单独的数字。
您可以在字符串末尾添加一个锚点$
,并重复匹配数字 12 次。
如果没有 12 个数字,则匹配为 None。
而不是使用 (.*?)
,您可以使模式更具体一些,并在没有匹配项的情况下减少回溯。
^(\d+\.)\s(\w+(?:\s+[\w,]+)*)(?:[.\s]+\.(\d+(?:[.,]\d+)*)){12}+[.\s]*$
例如
import regex
pattern = r"^(\d+\.)\s(\w+(?:\s+[\w,]+)*)(?:[.\s]+\.(\d+(?:[.,]\d+)*)){12}+[.\s]*$"
s = "1. Reserve December 31, prior year.................................................................................................................. ..4,587,658,997 .......................... .1,030,275,014 .....136,963,988 .......................... .3,276,184,545 .....144,235,450 .......................... .......................... .......................... .......................... ............................................................................................................................................ ..4,587,658,997 .......................... .1,030,275,014 .....136,963,988 .......................... .3,276,184,545 .....144,235,450 .......................... .......................... .......................... .......................... ............................................................................................................................................ ..4,587,658,997 .......4,587,658,997 ....."
m = regex.match(pattern, s)
if m:
print(m.group(1))
print(m.group(2))
for c in m.captures(3):
print(c)
输出
1.
Reserve December 31, prior year
4,587,658,997
1,030,275,014
136,963,988
3,276,184,545
144,235,450
4,587,658,997
1,030,275,014
136,963,988
3,276,184,545
144,235,450
4,587,658,997
4,587,658,997
如何在不多次写入的情况下重复匹配区域的某个部分?
例如:
txt = '1. Reserve December 31, prior year.................................................................................................................. ..4,587,658,997 .......................... .1,030,275,014 .....136,963,988 .......................... .3,276,184,545 .....144,235,450 .......................... .......................... .......................... .......................... ..........................'
splitter = '^([\d.]+)(.*?)\.\s\.[\.\s]+(\(*\d[\d,.]*\)*)?[\.\s]+(\(*\d[\d,.]*\)*)?[\.\s]+(\(*\d[\d,.]*\)*)?[\.\s]+(\(*\d[\d,.]*\)*)?[\.\s]+(\(*\d[\d,.]*\)*)?[\.\s]+(\(*\d[\d,.]*\)*)?[\.\s]+(\(*\d[\d,.]*\)*)?[\.\s]+(\(*\d[\d,.]*\)*)?[\.\s]+(\(*\d[\d,.]*\)*)?[\.\s]+(\(*\d[\d,.]*\)*)?[\.\s]+(\(*\d[\d,.]*\)*)?[\.\s]+(\(*\d[\d,.]*\)*)?'
parts = re.match(splitter, x, re.DOTALL)
我的正则表达式的第一部分,^([\d.]+)(.*?)\.\s\.
获取行号和标题:
- Reserve December 31, prior year
之后,我必须重复此部分 12 次,以确保标题后的数字得到 12 个额外的匹配项 [\.\s]+(\(*\d[\d,.]*\)*)?
。
如果没有 12 个号码,则 returns None
用于该特定匹配。
有没有一种方法可以重复这个表达式 12 次而不必编写这么长的正则表达式?我试过 (?:[\.\s]+(\(*\d[\d,.]*\)*)?){12}
但没有骰子。
我认为你只需要将你的第二部分放在一个组中,将 () 放在它周围,然后将你的确切计数放在组之外,如下所示:
checkIt = re.compile(r'^([\d.]+)(.*?)\.\s\.([\.\s]+(\(*\d[\d,.]*\)*)?){12}')
if checkIt.match(text):
do something
您还可以使用正则表达式 PyPi module 获取单独的值并循环捕获集合以获取单独的数字。
您可以在字符串末尾添加一个锚点$
,并重复匹配数字 12 次。
如果没有 12 个数字,则匹配为 None。
而不是使用 (.*?)
,您可以使模式更具体一些,并在没有匹配项的情况下减少回溯。
^(\d+\.)\s(\w+(?:\s+[\w,]+)*)(?:[.\s]+\.(\d+(?:[.,]\d+)*)){12}+[.\s]*$
例如
import regex
pattern = r"^(\d+\.)\s(\w+(?:\s+[\w,]+)*)(?:[.\s]+\.(\d+(?:[.,]\d+)*)){12}+[.\s]*$"
s = "1. Reserve December 31, prior year.................................................................................................................. ..4,587,658,997 .......................... .1,030,275,014 .....136,963,988 .......................... .3,276,184,545 .....144,235,450 .......................... .......................... .......................... .......................... ............................................................................................................................................ ..4,587,658,997 .......................... .1,030,275,014 .....136,963,988 .......................... .3,276,184,545 .....144,235,450 .......................... .......................... .......................... .......................... ............................................................................................................................................ ..4,587,658,997 .......4,587,658,997 ....."
m = regex.match(pattern, s)
if m:
print(m.group(1))
print(m.group(2))
for c in m.captures(3):
print(c)
输出
1.
Reserve December 31, prior year
4,587,658,997
1,030,275,014
136,963,988
3,276,184,545
144,235,450
4,587,658,997
1,030,275,014
136,963,988
3,276,184,545
144,235,450
4,587,658,997
4,587,658,997