替换 RegEx 中的精确分组部分 Python

Question

我有一个模板，我需要使用 Python 中的 Regex 替换其中的一部分。这是我的模板：（注意两条评论之间至少要换行）

hello
how's everything

<!--POSTS:START-->
some text
<!--POSTS:END-->

Some code here

我想替换 Python 中  和  之间的所有内容。所以我制作了 \n([^;]*)\n 模式，但它也包括  和 。

这是我想要的：

re.sub('...', 'foo', message)

# expected result:
hello
how's everything

<!--POSTS:START-->
foo
<!--POSTS:END-->

Some code here

谢谢。

Answer 1

您可以为开始和结束标记使用捕获组，并在目标替换字符串中将它们引用为 \1、\2 等。

如果文本多次出现 ...，则带有 .*? 的正则表达式将替换这些组中的每一个。如果'？删除了正则表达式，然后它将删除从第一组开始到最后一组结束的所有文本。

试试这个：

import re

s = '''
hello
how's everything

<!--POSTS:START-->
some text
<!--POSTS:END-->

Some code here
'''

# for multi-line matching need extra flags in the regexp
s = re.sub(r'(<!--POSTS:START-->\n).*?(\n<!--POSTS:END-->)', r'foo', s, flags=re.DOTALL)

# this inlines the DOTALL flag in the regexp for same result
# s = re.sub(r'(?s)(<!--POSTS:START-->\n).*?(\n<!--POSTS:END-->)', r'foo', s)

print(s)

输出：

hello
how's everything

<!--POSTS:START-->
foo
<!--POSTS:END-->

Some code here

Answer 2

您可以使用以下内容：

import re

new_content = re.sub(
    r'(<!--POSTS:START-->\n).*?(?=\n<!--POSTS:END-->)', r"foo",
    content, flags=re.DOTALL)

DOTALL 标志：制作“.”特殊字符完全匹配任何字符，包括换行符。

我正在使用两个东西来做你想做的事

Group lookahead "?="：断言可以在此处匹配给定的子模式，而无需消耗字符
非贪婪匹配模式 (*?)。这将以非贪婪模式匹配。这样我们就可以得到所有的模式separatly

由于我们使用的是lookahead，\n不会被消耗所以我只需要保留第一组并在匹配之间重写内容。这就是为什么我使用 foo 而不是 foo

如果您只需要修改第一个匹配项，您可以使用 count=1

re.sub(..., count=1)

你可以在这两行之间添加任何内容，它会按预期工作

Answer 3

检查这个https://docs.python.org/3/library/re.html

import re

pattern = r"(<!--POSTS:START-->\n).*(\n<!--POSTS:END-->)"
string = """hello
how's everything

<!--POSTS:START-->
some text
<!--POSTS:END-->

Some code here"""
result = re.sub(pattern, r"\g<1>foo\g<2>", string)
print(result)

结果：

hello
how's everything

<!--POSTS:START-->
foo
<!--POSTS:END-->

Some code here

替换 RegEx 中的精确分组部分 Python

Replace An Exact Grouped Part in RegEx Python

python

regex

regexp-replace

python-regex

python-re