如何删除或替换两个模式之间的多行文本

How to remove or replace a multiline text between two patterns

我想在我的一些脚本中添加一些客户标志,以便在 shell 脚本打包之前对其进行解析。

比方说,删除

之间的所有多行文本

^([#]|[//]){0,1}[_]+NOT_FOR_CUSTOMER_BEGIN[_]+\n

和介于

之间

^([#]|[//]){0,1}[_]+NOT_FOR_CUSTOMER_END[_]+\n

我希望它能够容错(关于“_”的数量),这就是我使用正则表达式的原因。

例如:

before.foo

i want this
#____NOT_FOR_CUSTOMER_BEGIN________
not this
nor this
#________NOT_FOR_CUSTOMER_END____
and this
//____NOT_FOR_CUSTOMER_BEGIN__
not this again
nor this again
//__________NOT_FOR_CUSTOMER_END____
and this again

将变为:

after.foo

i want this
and this
and this again

我宁愿使用 sed,但欢迎任何巧妙的解决方案:)

像这样:

cat before.foo |  tr '\n' '\a' | sed -r 's/([#]|[//]){0,1}[_]+NOT_FOR_CUSTOMER_BEGIN[_]+\a.*\a([#]|[//]){0,1}[_]+NOT_FOR_CUSTOMER_END[_]+\a/\a/g' | tr '\a' '\n' > after.foo

以这种方式获得 awk 解决方案,并使用您显示的示例进行编写和测试。

awk '
/^([#]|[/][/])__+NOT_FOR_CUSTOMER_BEGIN/{ found=1       }
/^([#]|[/][/])__+NOT_FOR_CUSTOMER_END/  { found=""; next}
!found
'  Input_file

使用您显示的示例,输出如下。

i want this
and this
and this again

解释: 简单的解释是:每当找到起始字符串(带正则表达式)时,将标志设置为 TRUE(用于非打印)每当结束字符串(使用正则表达式检查)出现时,使标志无效以开始打印(取决于行)下一行。

您可以使用 Python 脚本:

import re

data = """
i want this
#____NOT_FOR_CUSTOMER_BEGIN________
not this
nor this
#________NOT_FOR_CUSTOMER_END____
and this
//____NOT_FOR_CUSTOMER_BEGIN__
not this again
nor this again
//__________NOT_FOR_CUSTOMER_END____
and this again
"""

rx = re.compile(r'^(#|//)(?:.+\n)+^.+\n?', re.MULTILINE)
data = rx.sub('', data)
print(data)

哪个会产生

i want this
and this
and this again

a demo on regex101.com

您可以匹配从 NOT_FOR_CUSTOMER_BEGIN_NOT_FOR_CUSTOMER_END_

的尽可能少的行

请注意 [//] 匹配单个 / 而不是 //

^(?:#|//)_+NOT_FOR_CUSTOMER_BEGIN_+(?:\n.*)*?\n(?:#|//)_+NOT_FOR_CUSTOMER_END_+\n*
  • ^ 字符串开头
  • (?:#|//) 匹配 #//
  • _+NOT_FOR_CUSTOMER_BEGIN_+ 在 1 个或多个下划线之间匹配 NOT_FOR_CUSTOMER_BEGIN
  • (?:\n.*)*? 尽可能少地重复行
  • \n(?:#|//)_+NOT_FOR_CUSTOMER_END_+ 匹配换行符,然后在一个或多个下划线之间 #//NOT_FOR_CUSTOMER_END_
  • \n* 删除可选的尾随换行符

Regex demo

另一种与 Python 一起使用的方法:

import re

regex = r"^(?:#|//)_+NOT_FOR_CUSTOMER_BEGIN_+(?:\n.+)*?\n(?:#|//)_+NOT_FOR_CUSTOMER_END_+\n*"

s = ("i want this\n"
            "#____NOT_FOR_CUSTOMER_BEGIN________\n"
            "not this\n"
            "nor this\n"
            "#________NOT_FOR_CUSTOMER_END____\n"
            "and this\n"
            "//____NOT_FOR_CUSTOMER_BEGIN__\n"
            "not this again\n"
            "nor this again\n"
            "//__________NOT_FOR_CUSTOMER_END____\n"
            "and this again")

subst = ""
result = re.sub(regex, "", s, 0, re.MULTILINE)

if result:
    print (result)

输出

i want this
and this
and this again

sed 是处理此问题的最简单工具,因为它可以删除从开始模式到结束模式的行:

sed -E '/_+NOT_FOR_CUSTOMER_BEGIN_+/,/_+NOT_FOR_CUSTOMER_END_+/d' file

i want this
and this
and this again

如果您正在寻找 awk 解决方案,那么这里有一个更简单的 awk:

awk '/_+NOT_FOR_CUSTOMER_BEGIN_+/,/_+NOT_FOR_CUSTOMER_END_+/{next} 1' file