Python 如何从多行文本添加创建块
Python how to add create blocks from multiline text
我有下面的文本块,我试图用正则表达式分成 3 个块。当您看到名称字段时,它将开始一个新块。我怎样才能 return 全部 3 个块?
name: marvin
attribute: one
day: monday
dayalt: test << this is a field that can sometimes show up
name: judy
attribute: two
day: tuesday
name: dot
attribute: three
day: wednesday
import re
lines = """name: marvin
attribute: one
day: monday
dayalt: test << this is a field that can sometimes show up
name: judy
attribute: two
day: tuesday
name: dot
attribute: three
day: wednesday
"""
a=re.findall("(name.*)[\n\S\s]", lines, re.MULTILINE)
Block1 会 return 为 "name: marvin\nattribute: one\nday: monday\ndayalt: test
谢谢!
以下使用正先行的情况如何:
import re
lines = """name: marvin
attribute: one
day: monday
dayalt: test
name: judy
attribute: two
day: tuesday
name: dot
attribute: three
day: wednesday"""
blocks = re.findall(r"name: .*?(?=name: |$)", lines, re.DOTALL)
print(blocks)
# ['name: marvin\nattribute: one\nday: monday\ndayalt: test\n',
# 'name: judy\nattribute: two\nday: tuesday\n',
# 'name: dot\nattribute: three\nday: wednesday']
如果您使用 [\n\S\s]
(可以写成 [\S\s]
,因为 \s
也匹配换行符),则不需要 re.DOTALL
标志。
但是您的模式 (name.*)[\n\S\s]
只匹配 name
后跟该行的其余部分,然后是单个任意字符,因为字符 class 没有重复。
您可以省略使用非贪婪量词来防止不必要的回溯,而是匹配以 name:
开头的行,然后匹配所有不以它开头的行。
^name: .*(?:\n(?!name: ).*)*
说明
^
字符串开头
name: .*
匹配 name:
,一个 space 和行的其余部分
(?:
非捕获组(整体重复)
\n
匹配一个换行符
(?!name: ).*
断言不是name:
直接在当前位置的右边
)*
关闭非捕获组并可选择重复
例子
import re
pattern = r"^name: .*(?:\n(?!name: ).*)*"
lines = """name: marvin
attribute: one
day: monday
dayalt: test << this is a field that can sometimes show up
name: judy
attribute: two
day: tuesday
name: dot
attribute: three
day: wednesday
"""
matches = re.findall(pattern, lines, re.MULTILINE)
print(matches)
输出
[
'name: marvin\nattribute: one\nday: monday\ndayalt: test << this is a field that can sometimes show up',
'name: judy\nattribute: two\nday: tuesday',
'name: dot\nattribute: three\nday: wednesday\n'
]
我有下面的文本块,我试图用正则表达式分成 3 个块。当您看到名称字段时,它将开始一个新块。我怎样才能 return 全部 3 个块?
name: marvin
attribute: one
day: monday
dayalt: test << this is a field that can sometimes show up
name: judy
attribute: two
day: tuesday
name: dot
attribute: three
day: wednesday
import re
lines = """name: marvin
attribute: one
day: monday
dayalt: test << this is a field that can sometimes show up
name: judy
attribute: two
day: tuesday
name: dot
attribute: three
day: wednesday
"""
a=re.findall("(name.*)[\n\S\s]", lines, re.MULTILINE)
Block1 会 return 为 "name: marvin\nattribute: one\nday: monday\ndayalt: test
谢谢!
以下使用正先行的情况如何:
import re
lines = """name: marvin
attribute: one
day: monday
dayalt: test
name: judy
attribute: two
day: tuesday
name: dot
attribute: three
day: wednesday"""
blocks = re.findall(r"name: .*?(?=name: |$)", lines, re.DOTALL)
print(blocks)
# ['name: marvin\nattribute: one\nday: monday\ndayalt: test\n',
# 'name: judy\nattribute: two\nday: tuesday\n',
# 'name: dot\nattribute: three\nday: wednesday']
如果您使用 [\n\S\s]
(可以写成 [\S\s]
,因为 \s
也匹配换行符),则不需要 re.DOTALL
标志。
但是您的模式 (name.*)[\n\S\s]
只匹配 name
后跟该行的其余部分,然后是单个任意字符,因为字符 class 没有重复。
您可以省略使用非贪婪量词来防止不必要的回溯,而是匹配以 name:
开头的行,然后匹配所有不以它开头的行。
^name: .*(?:\n(?!name: ).*)*
说明
^
字符串开头name: .*
匹配name:
,一个 space 和行的其余部分(?:
非捕获组(整体重复)\n
匹配一个换行符(?!name: ).*
断言不是name:
直接在当前位置的右边
)*
关闭非捕获组并可选择重复
例子
import re
pattern = r"^name: .*(?:\n(?!name: ).*)*"
lines = """name: marvin
attribute: one
day: monday
dayalt: test << this is a field that can sometimes show up
name: judy
attribute: two
day: tuesday
name: dot
attribute: three
day: wednesday
"""
matches = re.findall(pattern, lines, re.MULTILINE)
print(matches)
输出
[
'name: marvin\nattribute: one\nday: monday\ndayalt: test << this is a field that can sometimes show up',
'name: judy\nattribute: two\nday: tuesday',
'name: dot\nattribute: three\nday: wednesday\n'
]