Python 如何从多行文本添加创建块

Question

我有下面的文本块，我试图用正则表达式分成 3 个块。当您看到名称字段时，它将开始一个新块。我怎样才能 return 全部 3 个块？

name: marvin
attribute: one
day: monday
dayalt: test    << this is a field that can sometimes show up
name: judy
attribute: two
day: tuesday
name: dot
attribute: three
day: wednesday

import re
lines = """name: marvin
attribute: one
day: monday
dayalt: test    << this is a field that can sometimes show up
name: judy
attribute: two
day: tuesday
name: dot
attribute: three
day: wednesday
"""

a=re.findall("(name.*)[\n\S\s]", lines, re.MULTILINE)

Block1 会 return 为 "name: marvin\nattribute: one\nday: monday\ndayalt: test

谢谢！

Answer 1

以下使用正先行的情况如何：

import re

lines = """name: marvin
attribute: one
day: monday
dayalt: test
name: judy
attribute: two
day: tuesday
name: dot
attribute: three
day: wednesday"""

blocks = re.findall(r"name: .*?(?=name: |$)", lines, re.DOTALL)
print(blocks)
# ['name: marvin\nattribute: one\nday: monday\ndayalt: test\n',
#  'name: judy\nattribute: two\nday: tuesday\n',
#  'name: dot\nattribute: three\nday: wednesday']

Answer 2

如果您使用 [\n\S\s]（可以写成 [\S\s]，因为 \s 也匹配换行符），则不需要 re.DOTALL 标志。

但是您的模式 (name.*)[\n\S\s] 只匹配 name 后跟该行的其余部分，然后是单个任意字符，因为字符 class 没有重复。

您可以省略使用非贪婪量词来防止不必要的回溯，而是匹配以 name: 开头的行，然后匹配所有不以它开头的行。

^name: .*(?:\n(?!name: ).*)*

说明

^ 字符串开头
name: .* 匹配 name:，一个 space 和行的其余部分
(?: 非捕获组（整体重复）
- \n 匹配一个换行符
- (?!name: ).*断言不是name: 直接在当前位置的右边
)* 关闭非捕获组并可选择重复

Regex demo | Python demo

例子

import re

pattern = r"^name: .*(?:\n(?!name: ).*)*"

lines = """name: marvin
attribute: one
day: monday
dayalt: test    << this is a field that can sometimes show up
name: judy
attribute: two
day: tuesday
name: dot
attribute: three
day: wednesday
"""

matches = re.findall(pattern, lines, re.MULTILINE)
print(matches)

输出

[
'name: marvin\nattribute: one\nday: monday\ndayalt: test    << this is a field that can sometimes show up',
'name: judy\nattribute: two\nday: tuesday',
'name: dot\nattribute: three\nday: wednesday\n'
]

Python 如何从多行文本添加创建块

Python how to add create blocks from multiline text

python

regex

string