sed -f not 运行多个相似的模式匹配命令，包括连接，针对多行输入文件？

Question

我在批处理文件中有一堆 sed 命令，我运行使用 -f。

/PATTERN1 /I,/;/s/^[ \t]*//g
/PATTERN1 /I{:a;/;/!N;s/\n/ /;ta;P;D}
s/\(PATTERN1\) \([ \tA-Za-z0-9,\"\']*\)(\(.*\))[ \t]*;[ \t]*$/ ;/I

如果我运行

gsed -f sed-file.sed input-file

似乎因为有多行具有相同的模式匹配，所以它运行是第一行而忽略其他行。如果我逐行注释掉它们自己在文件中的工作正常，但如果我运行它们取消注释它只处理第一个匹配项。

我的示例输入文件文件是

    not (this line);
pattern1 some text, ( some other text (5), some other text (6));
pattern1 this text
(
     that text (6),
     that text (7),
);
not this text either;

我希望它看起来像这样

    not (this line);
pattern1 some text,  some other text (5), some other text (6);
pattern1 this text that text (6), that text (7), ;
not this text either;

因此，如果我不对 sed 文件中的所有行进行注释（如上所述），那么我会得到：

    not (this line);
pattern1 some text, ( some other text (5), some other text (6));
pattern1 this text (      that text (6),      that text (7), );
not this text either;

如果我注释掉前两行我得到

    not (this line);
pattern1 some text,  some other text (5), some other text (6);
pattern1 this text
(
     that text (6),
     that text (7),
);
not this text either;

带有 pattern1 的第一行正确地删除了周围的括号。

如果我注释掉我得到的第一行

    not (this line);
pattern1 some text, ( some other text (5), some other text (6));
pattern1 this text (      that text (6),      that text (7), );
not this text either;

匹配 pattern1 的行被连接起来直到包括分号，但不再删除周围的括号。

如果我注释掉最后一行，我会得到 samne 但空格不会被删除...

    not (this line);
pattern1 some text, ( some other text (5), some other text (6));
pattern1 this text (      that text (6),      that text (7), );
not this text either;

如果我注释掉最后两行，我会得到：

    not (this line);
pattern1 some text, ( some other text (5), some other text (6));
pattern1 this text
(
that text (6),
that text (7),
);
not this text either;

正确删除以 pattern1 开头并以分号结尾的行中的空格。

如何确保所有 3 个 sed 都按顺序处理，但只使用一个命令？还是我必须单独运行它们？

Answer 1

当您使用地址范围规范然后在下面输入手动循环时 /PATTERN1 /I{ 它与地址范围冲突。

例如。例如：

seq 5 | sed -n '/1/,/3/{s/^/A/;p}; /1/{n;:a;/3/!{N;ba};p;}'

每个地址范围"remembers"无论是否输入，下一个命令都会被执行。如果您在手动循环中使用 N 或 n 手动读取直到 ;，则地址范围将等待下一个 ; 出现以停止输入。

如果你自己在 PATTERN1 和 ; 之间循环，无论如何你自己删除换行符后的 ^[ \t]*。

D 最多删除模式 space 中的第一个换行符，因此在删除所有换行符后 s/\n/ / 它将有效地删除所有内容。

我猜你会想要：

# if pattern is found
/PATTERN1 /I{
     # remove leading whitespaces 
     # I prefer [[:space:]]*
     s/^[ \t]*//
     # buffer everything until ';' is found
     :a; /;/!{N;ba;};
     # remove leading whitespaces after a newline
     s/\n[ \t]*/ /g; 
}
# remove the ( ... )
s/\(PATTERN1\) \([ \tA-Za-z0-9,\"\']*\)(\(.*\))[ \t]*;[ \t]*$/ ;/I

which outputs:

    not (this line);
pattern1 some text,  some other text (5), some other text (6);
pattern1 this text  that text (6), that text (7), ;
not this text either;

输出：

Answer 2

如果您有最新的 GNU sed，您可以运行在 debug mode:

SED PROGRAM:
  /PATTERN1 /I,/;/ s/^[ \t]*//g
  /PATTERN1 /I {
    :a
    /;/! N
    s/\n/ /
    t a
    P
    D
  }
  s/\(PATTERN1\) \([ \tA-Za-z0-9,\"\']*\)(\(.*\))[ \t]*;[ \t]*$/ ;/i

<snip>

INPUT:   'infile' line 2
PATTERN: pattern1 some text, ( some other text (5), some other text (6));
COMMAND: /PATTERN1 /I,/;/ s/^[ \t]*//g
MATCHED REGEX REGISTERS
  regex[0] = 0-0 ''

<snip>

PATTERN: pattern1 some text, ( some other text (5), some other text (6));
COMMAND:   t a
COMMAND:   P
pattern1 some text, ( some other text (5), some other text (6));
COMMAND:   D
INPUT:   'infile' line 3
PATTERN: pattern1 this text

观察在 D 之后，下一行是如何加载到模式缓冲区中的，而您的第三个命令因此从未执行过。 The manual 关于 D 的说法（强调我的）：

D
If pattern space contains no newline, start a normal new cycle as if the d command was issued. Otherwise, delete text in the pattern space up to the first newline, and restart cycle with the resultant pattern space, without reading a new line of input.

此时，您的模式 space 不包含换行符，您只是开始一个新的循环。

看起来您的脚本可以这样修复：

/PATTERN1 /I,/;/ s/^[ \t]*//g
/PATTERN1 /I {
    :a
    /;/! N
    s/\n/ /
    t a
    s/[[:blank:]]\{1,\}/ /g
}

您不需要 P;D 模式；当您想要移动多行 window 时，通常会使用它。我在你的第二个命令的循环之后添加了一个替换，而不是你的第三个命令。

Answer 3

sed 是对单个字符串执行 s/old/new 的最佳工具。你正在做的事情比这复杂得多，所以你不应该考虑使用 sed 。这将在每个 UNIX 机器上使用任何 shell 中的任何 awk 从您的 posted 样本输入中产生预期的输出：

$ cat tst.awk
tolower([=10=]) ~ tolower("^pattern1") { inBlock = 1 }
inBlock {
    block = block [=10=] ORS
    if ( sub(/);\n/,";",block) ) {
        sub(/\(/,"",block)
        gsub(/[[:space:]]+/," ",block)
        print block
        block = ""
        inBlock = 0
    }
    next
}
{ print }

$ awk -f tst.awk file
    not (this line);
pattern1 some text, some other text (5), some other text (6);
pattern1 this text that text (6), that text (7), ;
not this text either;

它只是寻找以 "pattern1" 开头的行，当它找到它时，它会创建一个从那里到第一个 ); 的文本块，它在一行的末尾找到，然后删除第一个( 和最后一个 )，将所有白色链 space 转换为单个空白并打印该块。不涉及神秘的、神秘的、单一的字符运行，只是一个清晰、简单的程序，可以运行在任何 UNIX 机器上，并且将来很容易增强 if/when 你需要做任何其他事情.

如果您不介意使用特定于 GNU 的解决方案，这里有一个更简单的 GNU awk 解决方案，它只依赖于每条记录以 ;\n:

终止

$ cat tst.awk
BEGIN {
    RS=ORS=";\n"
    IGNORECASE=1
}
/^pattern1/ {
    [=11=] = gensub(/\((.*)\)/,"\1",1)
    gsub(/[[:space:]]+/," ")
}
{ print }

$ awk -f tst.awk file
    not (this line);
pattern1 some text, some other text (5), some other text (6);
pattern1 this text that text (6), that text (7), ;
not this text either;

如果这不是您所需要的全部，那么 post 一个新问题，其中包含上述内容不适用的输入，并用 awk 标记它。但是不要一直尝试用 sed 做这样的事情，它只是错误的工具。

sed -f not 运行多个相似的模式匹配命令，包括连接，针对多行输入文件？

sed -f not running multiple similar pattern match commands, including concatenate, against input file with multiple lines?

ksh

sed

sed -f not 运行 多个相似的模式匹配命令，包括连接，针对多行输入文件？

sed -f not running multiple similar pattern match commands, including concatenate, against input file with multiple lines?

ksh

sed

sed -f not 运行多个相似的模式匹配命令，包括连接，针对多行输入文件？