如何删除或替换两个模式之间的多行文本
How to remove or replace a multiline text between two patterns
我想在我的一些脚本中添加一些客户标志,以便在 shell 脚本打包之前对其进行解析。
比方说,删除
之间的所有多行文本
^([#]|[//]){0,1}[_]+NOT_FOR_CUSTOMER_BEGIN[_]+\n
和介于
之间
^([#]|[//]){0,1}[_]+NOT_FOR_CUSTOMER_END[_]+\n
我希望它能够容错(关于“_”的数量),这就是我使用正则表达式的原因。
例如:
before.foo
i want this
#____NOT_FOR_CUSTOMER_BEGIN________
not this
nor this
#________NOT_FOR_CUSTOMER_END____
and this
//____NOT_FOR_CUSTOMER_BEGIN__
not this again
nor this again
//__________NOT_FOR_CUSTOMER_END____
and this again
将变为:
after.foo
i want this
and this
and this again
我宁愿使用 sed,但欢迎任何巧妙的解决方案:)
像这样:
cat before.foo | tr '\n' '\a' | sed -r 's/([#]|[//]){0,1}[_]+NOT_FOR_CUSTOMER_BEGIN[_]+\a.*\a([#]|[//]){0,1}[_]+NOT_FOR_CUSTOMER_END[_]+\a/\a/g' | tr '\a' '\n' > after.foo
以这种方式获得 awk
解决方案,并使用您显示的示例进行编写和测试。
awk '
/^([#]|[/][/])__+NOT_FOR_CUSTOMER_BEGIN/{ found=1 }
/^([#]|[/][/])__+NOT_FOR_CUSTOMER_END/ { found=""; next}
!found
' Input_file
使用您显示的示例,输出如下。
i want this
and this
and this again
解释: 简单的解释是:每当找到起始字符串(带正则表达式)时,将标志设置为 TRUE(用于非打印)每当结束字符串(使用正则表达式检查)出现时,使标志无效以开始打印(取决于行)下一行。
您可以使用 Python
脚本:
import re
data = """
i want this
#____NOT_FOR_CUSTOMER_BEGIN________
not this
nor this
#________NOT_FOR_CUSTOMER_END____
and this
//____NOT_FOR_CUSTOMER_BEGIN__
not this again
nor this again
//__________NOT_FOR_CUSTOMER_END____
and this again
"""
rx = re.compile(r'^(#|//)(?:.+\n)+^.+\n?', re.MULTILINE)
data = rx.sub('', data)
print(data)
哪个会产生
i want this
and this
and this again
您可以匹配从 NOT_FOR_CUSTOMER_BEGIN_
到 NOT_FOR_CUSTOMER_END_
的尽可能少的行
请注意 [//]
匹配单个 /
而不是 //
^(?:#|//)_+NOT_FOR_CUSTOMER_BEGIN_+(?:\n.*)*?\n(?:#|//)_+NOT_FOR_CUSTOMER_END_+\n*
^
字符串开头
(?:#|//)
匹配 #
或 //
_+NOT_FOR_CUSTOMER_BEGIN_+
在 1 个或多个下划线之间匹配 NOT_FOR_CUSTOMER_BEGIN
(?:\n.*)*?
尽可能少地重复行
\n(?:#|//)_+NOT_FOR_CUSTOMER_END_+
匹配换行符,然后在一个或多个下划线之间 #
或 //
和 NOT_FOR_CUSTOMER_END_
\n*
删除可选的尾随换行符
另一种与 Python 一起使用的方法:
import re
regex = r"^(?:#|//)_+NOT_FOR_CUSTOMER_BEGIN_+(?:\n.+)*?\n(?:#|//)_+NOT_FOR_CUSTOMER_END_+\n*"
s = ("i want this\n"
"#____NOT_FOR_CUSTOMER_BEGIN________\n"
"not this\n"
"nor this\n"
"#________NOT_FOR_CUSTOMER_END____\n"
"and this\n"
"//____NOT_FOR_CUSTOMER_BEGIN__\n"
"not this again\n"
"nor this again\n"
"//__________NOT_FOR_CUSTOMER_END____\n"
"and this again")
subst = ""
result = re.sub(regex, "", s, 0, re.MULTILINE)
if result:
print (result)
输出
i want this
and this
and this again
sed
是处理此问题的最简单工具,因为它可以删除从开始模式到结束模式的行:
sed -E '/_+NOT_FOR_CUSTOMER_BEGIN_+/,/_+NOT_FOR_CUSTOMER_END_+/d' file
i want this
and this
and this again
如果您正在寻找 awk
解决方案,那么这里有一个更简单的 awk
:
awk '/_+NOT_FOR_CUSTOMER_BEGIN_+/,/_+NOT_FOR_CUSTOMER_END_+/{next} 1' file
我想在我的一些脚本中添加一些客户标志,以便在 shell 脚本打包之前对其进行解析。
比方说,删除
之间的所有多行文本^([#]|[//]){0,1}[_]+NOT_FOR_CUSTOMER_BEGIN[_]+\n
和介于
之间^([#]|[//]){0,1}[_]+NOT_FOR_CUSTOMER_END[_]+\n
我希望它能够容错(关于“_”的数量),这就是我使用正则表达式的原因。
例如:
before.foo
i want this
#____NOT_FOR_CUSTOMER_BEGIN________
not this
nor this
#________NOT_FOR_CUSTOMER_END____
and this
//____NOT_FOR_CUSTOMER_BEGIN__
not this again
nor this again
//__________NOT_FOR_CUSTOMER_END____
and this again
将变为:
after.foo
i want this
and this
and this again
我宁愿使用 sed,但欢迎任何巧妙的解决方案:)
像这样:
cat before.foo | tr '\n' '\a' | sed -r 's/([#]|[//]){0,1}[_]+NOT_FOR_CUSTOMER_BEGIN[_]+\a.*\a([#]|[//]){0,1}[_]+NOT_FOR_CUSTOMER_END[_]+\a/\a/g' | tr '\a' '\n' > after.foo
以这种方式获得 awk
解决方案,并使用您显示的示例进行编写和测试。
awk '
/^([#]|[/][/])__+NOT_FOR_CUSTOMER_BEGIN/{ found=1 }
/^([#]|[/][/])__+NOT_FOR_CUSTOMER_END/ { found=""; next}
!found
' Input_file
使用您显示的示例,输出如下。
i want this
and this
and this again
解释: 简单的解释是:每当找到起始字符串(带正则表达式)时,将标志设置为 TRUE(用于非打印)每当结束字符串(使用正则表达式检查)出现时,使标志无效以开始打印(取决于行)下一行。
您可以使用 Python
脚本:
import re
data = """
i want this
#____NOT_FOR_CUSTOMER_BEGIN________
not this
nor this
#________NOT_FOR_CUSTOMER_END____
and this
//____NOT_FOR_CUSTOMER_BEGIN__
not this again
nor this again
//__________NOT_FOR_CUSTOMER_END____
and this again
"""
rx = re.compile(r'^(#|//)(?:.+\n)+^.+\n?', re.MULTILINE)
data = rx.sub('', data)
print(data)
哪个会产生
i want this
and this
and this again
您可以匹配从 NOT_FOR_CUSTOMER_BEGIN_
到 NOT_FOR_CUSTOMER_END_
请注意 [//]
匹配单个 /
而不是 //
^(?:#|//)_+NOT_FOR_CUSTOMER_BEGIN_+(?:\n.*)*?\n(?:#|//)_+NOT_FOR_CUSTOMER_END_+\n*
^
字符串开头(?:#|//)
匹配#
或//
_+NOT_FOR_CUSTOMER_BEGIN_+
在 1 个或多个下划线之间匹配NOT_FOR_CUSTOMER_BEGIN
(?:\n.*)*?
尽可能少地重复行\n(?:#|//)_+NOT_FOR_CUSTOMER_END_+
匹配换行符,然后在一个或多个下划线之间#
或//
和NOT_FOR_CUSTOMER_END_
\n*
删除可选的尾随换行符
另一种与 Python 一起使用的方法:
import re
regex = r"^(?:#|//)_+NOT_FOR_CUSTOMER_BEGIN_+(?:\n.+)*?\n(?:#|//)_+NOT_FOR_CUSTOMER_END_+\n*"
s = ("i want this\n"
"#____NOT_FOR_CUSTOMER_BEGIN________\n"
"not this\n"
"nor this\n"
"#________NOT_FOR_CUSTOMER_END____\n"
"and this\n"
"//____NOT_FOR_CUSTOMER_BEGIN__\n"
"not this again\n"
"nor this again\n"
"//__________NOT_FOR_CUSTOMER_END____\n"
"and this again")
subst = ""
result = re.sub(regex, "", s, 0, re.MULTILINE)
if result:
print (result)
输出
i want this
and this
and this again
sed
是处理此问题的最简单工具,因为它可以删除从开始模式到结束模式的行:
sed -E '/_+NOT_FOR_CUSTOMER_BEGIN_+/,/_+NOT_FOR_CUSTOMER_END_+/d' file
i want this
and this
and this again
如果您正在寻找 awk
解决方案,那么这里有一个更简单的 awk
:
awk '/_+NOT_FOR_CUSTOMER_BEGIN_+/,/_+NOT_FOR_CUSTOMER_END_+/{next} 1' file