python- re.findall 如何将内容分成组

Question

我需要说明 re.findall 方法的正则表达式的工作原理。

pattern = re.compile(r'(?<=\\\[-16pt]\n)([\s\S]*?)(?=\\\n\thinhline)')
content= ' '.join(re.findall(pattern, content))

所以上面打印了模式匹配的所有内容，开头是：\[-16pt]，结尾是'\\n thinhline'加上它后面的所有文本。如果我有以下与模式匹配的内容：

\[-16pt]
x = 10
print ("hi")
\
\thinhline
\[-16pt]
y = 3
print ("bye")
\
\thinhline
\[-16pt]
z = 7
print ("zap")
\
\thinhline
This is random text.
All of this is matched by re.findall, even though it is not included within the pattern.
xyz = "xyz"

我如何将每个组分开，以便我能够独立编辑它们：

第 1 组：

x = 10
print ("hi")

第 2 组：

y = 3
print ("bye")

第 3 组：

z = 7
print ("zap")

后面没有匹配的额外内容？

谢谢。

Answer 1

import re
s=re.findall(r"(?<=\\\[-16pt]\n)([\s\S]*?)(?=\\\n\thinhline)",test_str)

查找所有 returns 匹配的组列表。

这里s是一个list.You可以通过参考s[0]或s[1].

访问你想要的

Answer 2

考虑以下运行可用程序：

import re

content="""\[-16pt]
x = 10
print ("hi")
\
thinhline
\[-16pt]
y = 3
print ("bye")
\
thinhline
\[-16pt]
z = 7
print ("zap")
\
thinhline
This is random text.
"""

pattern = re.compile(r"""(\\[-16pt]\n)    # Start. Don't technically need to capture.
                         (.*?)             # What we want. Must capture ;)
                         (\n\\nthinhline) # End. Also don't really need to capture
                      """, re.X | re.DOTALL)

for m in pattern.finditer(content):
    print("Matched:\n----\n%s\n----\n" % m.group(2))

当运行时输出：

Matched:
----
x = 10
print ("hi")
----

Matched:
----
y = 3
print ("bye")
----

Matched:
----
z = 7
print ("zap")
----

备注：

通过使用 re.X 选项，表达式可以是多行和注释的
通过使用 re.DOTALL 选项，可以删除过多的反斜杠和“.*?”组（即“非贪婪地获取每个字符，直到下一场比赛”）将包括换行符。
我使用了 finditer 而不是 findall ...从技术上讲它移开了从你的问题，但你想与每场比赛一起工作所以我想通了是个好方法。
我从 thinhline 上取下 \t 标签，因为我不确定它是否意思是制表符或反冲-then-t。对以上影响不大但只是想说清楚。
我捕获开始和结束组只是为了演示。只有中间真的需要组。

Answer 3

使用re.findall

pattern=r"""(\\[-16pt])(\n[a-z]+\s+\={1}\s+[0-9]+)(\nprint\s+\({1}\"[a-z]+\"\){1})\n(\\nthinhline)\n"""
result=re.findall(pattern,content)
count=1
for item in result:
    print("row",count,item[2])
    count+=1

输出：

row 1 
print ("hi")
row 2 
print ("bye")
row 3 
print ("zap")

python- re.findall 如何将内容分成组

python- re.findall how to separate content into groups

python

regex

python-2.7