如何从匹配字符串中删除多行文本？

Question

我有以下代码片段，我想用正则表达式（或其他方法）完全删除。

# ---------------------------------------------------------------------------------------------------------------------
# MODULE PARAMETERS
# These are the variables we have to pass in to use the module specified in the terragrunt configuration above
# ---------------------------------------------------------------------------------------------------------------------

是否有语法告诉匹配删除两个匹配之间的所有内容？

看起来应该很容易做到，但出于某种原因，我只能找到一个正则表达式，通过这段代码提取 第一个和最后一个匹配项 。我已经尝试了这个正则表达式的许多排列，但无法让它工作。

...
re.sub(r'# --*[\S\s]*---', '', lines[line])
...

这个 regex tool 说我的正则表达式应该可以工作。

编辑：

我感兴趣的文本正在逐行阅读。

...
for the_file in files_to_update:
    with open(the_file + "/the_file", "r") as in_file:
        lines = in_file.readlines()

并随后被迭代。上面的片段实际上发生在这个循环中。

for line in range(len(lines)):

Answer 1

您应该将文件读入单个变量，以便能够运行其上的正则表达式，它可以匹配多行文本。

您可以使用

with open(filepath, 'r') as fr:
  with open(filesavepath, 'w') as fw:
    fw.write( re.sub(r'^# -+(?:\n# .*)*\n# -+$\n?', '', fr.read(), flags=re.M) )

参见Python demo and a regex demo。

这里，fr是你读取的文件的句柄，fw是你要写入的文件的句柄。 re.sub 的输入是 fr.read()，此方法获取整个文件内容并传递给正则表达式引擎。

正则表达式表示：

^ - 行首（由于 re.M）
# -+ - 一个 #、space，然后是一个或多个连字符
(?:\n# .*)* - 换行符的 0 次或多次重复，#，space，直到行尾的任何文本
\n - 一个换行符
# -+$ - #, space, 一个或多个连字符和行尾
\n? - 一个可选的换行符。

一种非正则表达式的删除评论的方法是逐行阅读，检查一行是否以 # --- 开头，并设置一个标志来检查我们是否在评论中：

for line in fr:
    if line.startswith('# ---'):
        flag = not flag
        continue
    if flag:
        lines.append(line)
        
print("\n".join(lines))

见this Python demo。

Answer 2

为什么不直接使用带有小函数的字符串函数？

data = """
# ---------------------------------------------------------------------------------------------------------------------
# MODULE PARAMETERS
# These are the variables we have to pass in to use the module specified in the terragrunt configuration above
# ---------------------------------------------------------------------------------------------------------------------

foo
# some other comment
bar

"""

def remover(block):
    remove = False

    for line in block.split("\n"):
        if line.startswith("# ---"):
            remove = not remove
        elif not remove:
            yield line

cleaned = [line for line in remover(data)]
print(cleaned)

这会产生

['', '', 'foo', '# some other comment', 'bar', '', '']

如何从匹配字符串中删除多行文本？

How to remove multiline text from matching strings?

python

regex

comments