字符串之间的 Unix 打印模式
Unix print pattern between the Strings
我有一个包含如下内容的文件。 START
和 STOP
代表一个块。
START
X | 123
Y | abc
Z | +=-
STOP
START
X | 456
Z | +%$
STOP
START
X | 789
Y | ghi
Z | !@#
STOP
我想为每个块以下面的格式打印 X
和 Y
的值:
123 ~~ abc
456 ~~
789 ~~ ghi
如果 START
/STOP
单次出现,sed -n '/START/,/STOP/p'
会有所帮助。由于这是重复的,我需要你的帮助。
对于涉及处理多行的任何问题,Sed 总是错误的选择。在 20 世纪 70 年代中期 awk 被发明时,sed 的所有神秘构造都已过时。
每当您的输入中有名称-值对时,我发现创建一个将每个名称映射到它的值的数组然后通过名称访问该数组是很有用的。在这种情况下,将 GNU awk 用于多字符 RS 并删除数组:
$ cat tst.awk
BEGIN {
RS = "\nSTOP\n"
OFS=" ~~ "
}
{
delete n2v
for (i=2;i<=NF;i+=3) {
n2v[$i] = $(i+2)
}
print n2v["X"], n2v["Y"]
}
$ gawk -f tst.awk file
123 ~~ abc
456 ~~
789 ~~ ghi
基于我自己的解决方案How to select lines between two marker patterns which may occur multiple times with awk/sed:
awk -v OFS=" ~~ " '
/START/{flag=1;next}
/STOP/{flag=0; print first, second; first=second=""}
flag && =="X" {first=}
flag && =="Y" {second=}' file
测试
$ awk -v OFS=" ~~ " '/START/{flag=1;next}/STOP/{flag=0; print first, second; first=second=""} flag && =="X" {first=} flag && =="Y" {second=}' a
123 ~~ abc
456 ~~
789 ~~ ghi
因为我喜欢脑筋急转弯(不是因为这种事情在 sed 中很实用),一个可能的 sed 解决方案是
sed -n '/START/,/STOP/ { //!H; // { g; /^$/! { s/.*\nX | \([^\n]*\).*/ ~~/; ta; s/.*/~~/; :a G; s/\n.*Y | \([^\n]*\).*/ /; s/\n.*//; p; s/.*//; h } } }'
其工作原理如下:
/START/,/STOP/ { # between two start and stop lines
//! H # assemble the lines in the hold buffer
# note that // repeats the previously
# matched pattern, so // matches the
# start and end lines, //! all others.
// { # At the end
g # That is: When it is one of the
/^$/! { # boundary lines and the hold buffer
# is not empty
s/.*\nX | \([^\n]*\).*/ ~~/ # isolate the X value, append ~~
ta # if there is no X value, just use ~~
s/.*/~~/
:a
G # append the hold buffer to that
s/\n.*Y | \([^\n]*\).*/ / # and isolate the Y value so that
# the pattern space contains X ~~ Y
s/\n.*// # Cutting off everything after a newline
# is important if there is no Y value
# and the previous substitution did
# nothing
p # print the result
s/.*// # and make sure the hold buffer is
h # empty for the next block.
}
}
}
我有一个包含如下内容的文件。 START
和 STOP
代表一个块。
START
X | 123
Y | abc
Z | +=-
STOP
START
X | 456
Z | +%$
STOP
START
X | 789
Y | ghi
Z | !@#
STOP
我想为每个块以下面的格式打印 X
和 Y
的值:
123 ~~ abc
456 ~~
789 ~~ ghi
如果 START
/STOP
单次出现,sed -n '/START/,/STOP/p'
会有所帮助。由于这是重复的,我需要你的帮助。
对于涉及处理多行的任何问题,Sed 总是错误的选择。在 20 世纪 70 年代中期 awk 被发明时,sed 的所有神秘构造都已过时。
每当您的输入中有名称-值对时,我发现创建一个将每个名称映射到它的值的数组然后通过名称访问该数组是很有用的。在这种情况下,将 GNU awk 用于多字符 RS 并删除数组:
$ cat tst.awk
BEGIN {
RS = "\nSTOP\n"
OFS=" ~~ "
}
{
delete n2v
for (i=2;i<=NF;i+=3) {
n2v[$i] = $(i+2)
}
print n2v["X"], n2v["Y"]
}
$ gawk -f tst.awk file
123 ~~ abc
456 ~~
789 ~~ ghi
基于我自己的解决方案How to select lines between two marker patterns which may occur multiple times with awk/sed:
awk -v OFS=" ~~ " '
/START/{flag=1;next}
/STOP/{flag=0; print first, second; first=second=""}
flag && =="X" {first=}
flag && =="Y" {second=}' file
测试
$ awk -v OFS=" ~~ " '/START/{flag=1;next}/STOP/{flag=0; print first, second; first=second=""} flag && =="X" {first=} flag && =="Y" {second=}' a
123 ~~ abc
456 ~~
789 ~~ ghi
因为我喜欢脑筋急转弯(不是因为这种事情在 sed 中很实用),一个可能的 sed 解决方案是
sed -n '/START/,/STOP/ { //!H; // { g; /^$/! { s/.*\nX | \([^\n]*\).*/ ~~/; ta; s/.*/~~/; :a G; s/\n.*Y | \([^\n]*\).*/ /; s/\n.*//; p; s/.*//; h } } }'
其工作原理如下:
/START/,/STOP/ { # between two start and stop lines
//! H # assemble the lines in the hold buffer
# note that // repeats the previously
# matched pattern, so // matches the
# start and end lines, //! all others.
// { # At the end
g # That is: When it is one of the
/^$/! { # boundary lines and the hold buffer
# is not empty
s/.*\nX | \([^\n]*\).*/ ~~/ # isolate the X value, append ~~
ta # if there is no X value, just use ~~
s/.*/~~/
:a
G # append the hold buffer to that
s/\n.*Y | \([^\n]*\).*/ / # and isolate the Y value so that
# the pattern space contains X ~~ Y
s/\n.*// # Cutting off everything after a newline
# is important if there is no Y value
# and the previous substitution did
# nothing
p # print the result
s/.*// # and make sure the hold buffer is
h # empty for the next block.
}
}
}