使这个 sed 脚本与一个而不是另一个的这两对字符串有什么不同?
What is different about these two pairs of strings that makes this sed script with one and not the other?
这个问题与我今天早些时候提出的另一个问题有关:
我有一个这样的文本文件
I want= to keep this
This is some <text> I want to keep <and "something" in tags that I" want to keep> aff FOO1 WebServices and some more "text" that" should "</be> </deleted>
<this is stuff in tags I want=to begone> and other text I want gone too. </this is stuff in tags I want to begone>
A novice programmer walked into a "BAR2" descript keepthis
and this even more text, let's keep it
<I actually want this>
and this= too.`
当我使用 sed -f script.sed file.txt
到 运行 这个脚本时:
# Check for "aff"
/\baff\b/ {
# Define a label "a"
:a
# If the line does not contain "desc"
/\bdesc\b/!{
# Get the next line of input and append
# it to the pattern buffer
N
# Branch back to label "a"
ba
}
# Replace everything between aff and desc
s/\(\baff\)\b.*\b\(desc\b\)/TEST DATA/
}
我得到这个作为我的输出:
I want= to keep this
This is some <text> I want to keep <and "something" in tags that I" want to keep> aff FOO1 WebServices and some more "text" that" should "</be> </deleted>
<this is stuff in tags I want=to begone> and other text I want gone too. </this is stuff in tags I want to begone>
A novice programmer walked into a "BAR2" descript keepthis
and this even more text, let's keep it
<I actually want this>
and this= too.
但是,只需将搜索字符串从 aff
和 desc
更改为 FOO1
和 BAR2
:
# Check for "FOO1"
/\bFOO1\b/ {
# Define a label "a"
:a
# If the line does not contain "BAR2"
/\bBAR2\b/!{
# Get the next line of input and append
# it to the pattern buffer
N
# Branch back to label "a"
ba
}
# Replace everything between FOO1 and BAR2
s/\(\bFOO1\)\b.*\b\(BAR2\b\)/TEST DATA/
}
给出预期的输出:
I want= to keep this
This is some <text> I want to keep <and "something" in tags that I" want to keep> aff FOO1TEST DATABAR2" descript keepthis
and this even more text, let's keep it
<I actually want this>
and this= too.`
我完全不知道这里发生了什么。为什么在 FOO1
和 BAR2
之间搜索的工作方式与使用 aff
和 desc
的完全相同的脚本不同?
结束标记应为 \bdesc
而不是 \bdesc\b
。
注意模式中的 \b
,它匹配 word boundary。您上面的文字包含单词 description,但不包含 desc。
你之前的问题让我假设你想要那个。如果您不关心单词边界,请完全删除 \b
转义序列。
这个问题与我今天早些时候提出的另一个问题有关:
我有一个这样的文本文件
I want= to keep this
This is some <text> I want to keep <and "something" in tags that I" want to keep> aff FOO1 WebServices and some more "text" that" should "</be> </deleted>
<this is stuff in tags I want=to begone> and other text I want gone too. </this is stuff in tags I want to begone>
A novice programmer walked into a "BAR2" descript keepthis
and this even more text, let's keep it
<I actually want this>
and this= too.`
当我使用 sed -f script.sed file.txt
到 运行 这个脚本时:
# Check for "aff"
/\baff\b/ {
# Define a label "a"
:a
# If the line does not contain "desc"
/\bdesc\b/!{
# Get the next line of input and append
# it to the pattern buffer
N
# Branch back to label "a"
ba
}
# Replace everything between aff and desc
s/\(\baff\)\b.*\b\(desc\b\)/TEST DATA/
}
我得到这个作为我的输出:
I want= to keep this
This is some <text> I want to keep <and "something" in tags that I" want to keep> aff FOO1 WebServices and some more "text" that" should "</be> </deleted>
<this is stuff in tags I want=to begone> and other text I want gone too. </this is stuff in tags I want to begone>
A novice programmer walked into a "BAR2" descript keepthis
and this even more text, let's keep it
<I actually want this>
and this= too.
但是,只需将搜索字符串从 aff
和 desc
更改为 FOO1
和 BAR2
:
# Check for "FOO1"
/\bFOO1\b/ {
# Define a label "a"
:a
# If the line does not contain "BAR2"
/\bBAR2\b/!{
# Get the next line of input and append
# it to the pattern buffer
N
# Branch back to label "a"
ba
}
# Replace everything between FOO1 and BAR2
s/\(\bFOO1\)\b.*\b\(BAR2\b\)/TEST DATA/
}
给出预期的输出:
I want= to keep this
This is some <text> I want to keep <and "something" in tags that I" want to keep> aff FOO1TEST DATABAR2" descript keepthis
and this even more text, let's keep it
<I actually want this>
and this= too.`
我完全不知道这里发生了什么。为什么在 FOO1
和 BAR2
之间搜索的工作方式与使用 aff
和 desc
的完全相同的脚本不同?
结束标记应为 \bdesc
而不是 \bdesc\b
。
注意模式中的 \b
,它匹配 word boundary。您上面的文字包含单词 description,但不包含 desc。
你之前的问题让我假设你想要那个。如果您不关心单词边界,请完全删除 \b
转义序列。