Python 如何从多行文本添加创建块

Python how to add create blocks from multiline text

我有下面的文本块,我试图用正则表达式分成 3 个块。当您看到名称字段时,它将开始一个新块。我怎样才能 return 全部 3 个块?

name: marvin
attribute: one
day: monday
dayalt: test    << this is a field that can sometimes show up
name: judy
attribute: two
day: tuesday
name: dot
attribute: three
day: wednesday
import re
lines = """name: marvin
attribute: one
day: monday
dayalt: test    << this is a field that can sometimes show up
name: judy
attribute: two
day: tuesday
name: dot
attribute: three
day: wednesday
"""

a=re.findall("(name.*)[\n\S\s]", lines, re.MULTILINE)

Block1 会 return 为 "name: marvin\nattribute: one\nday: monday\ndayalt: test

谢谢!

以下使用正先行的情况如何:

import re

lines = """name: marvin
attribute: one
day: monday
dayalt: test
name: judy
attribute: two
day: tuesday
name: dot
attribute: three
day: wednesday"""

blocks = re.findall(r"name: .*?(?=name: |$)", lines, re.DOTALL)
print(blocks)
# ['name: marvin\nattribute: one\nday: monday\ndayalt: test\n',
#  'name: judy\nattribute: two\nday: tuesday\n',
#  'name: dot\nattribute: three\nday: wednesday']

如果您使用 [\n\S\s](可以写成 [\S\s],因为 \s 也匹配换行符),则不需要 re.DOTALL 标志。

但是您的模式 (name.*)[\n\S\s] 只匹配 name 后跟该行的其余部分,然后是单个任意字符,因为字符 class 没有重复。

您可以省略使用非贪婪量词来防止不必要的回溯,而是匹配以 name: 开头的行,然后匹配所有不以它开头的行。

^name: .*(?:\n(?!name: ).*)*

说明

  • ^ 字符串开头
  • name: .* 匹配 name:,一个 space 和行的其余部分
  • (?: 非捕获组(整体重复)
    • \n 匹配一个换行符
    • (?!name: ).*断言不是name: 直接在当前位置的右边
  • )* 关闭非捕获组并可选择重复

Regex demo | Python demo

例子

import re

pattern = r"^name: .*(?:\n(?!name: ).*)*"

lines = """name: marvin
attribute: one
day: monday
dayalt: test    << this is a field that can sometimes show up
name: judy
attribute: two
day: tuesday
name: dot
attribute: three
day: wednesday
"""

matches = re.findall(pattern, lines, re.MULTILINE)
print(matches)

输出

[
'name: marvin\nattribute: one\nday: monday\ndayalt: test    << this is a field that can sometimes show up',
'name: judy\nattribute: two\nday: tuesday',
'name: dot\nattribute: three\nday: wednesday\n'
]