为什么这个 sed 命令输出“[18”而不是“18”？

Question

echo [18%] | sed s:[\[%\]]::g

我真的很困惑，因为完全相同的模式成功地替换了 vim 中的 [18%]。我还在一些在线正则表达式工具中测试了该表达式，它们都说它将按预期匹配 [、% 和 ]。我尝试添加 -r 选项以及用引号将替换命令括起来。

我知道我可以使用其他命令来完成此任务，但我想知道它为什么会这样，以便我可以更好地了解 sed。

Answer 1

$ echo [18%] | sed s:[][%]::g
18

sed 支持 POSIX.2 正则表达式语法：默认为基本 (BRE) 语法，带有 -r 标志的扩展语法。在 POSIX.2 语法中，无论是基本语法还是扩展语法，您都可以通过将右方括号作为字符 class 中的第一个字符来包含它。反斜杠没有帮助。

这很烦人，因为几乎所有其他现代语言和工具都使用 Perl 或类似 Perl 的正则表达式语法。 POSIX 语法不合时宜。

您可以在 regex(7) 手册页中阅读 POSIX.2 语法。

 A bracket expression is a list of  characters  enclosed  in  "[]".   It  normally
 matches  any  single character from the list (but see below).  If the list begins
 with '^', it matches any single character (but see below) not from  the  rest  of
 the  list.  If two characters in the list are separated by '-', this is shorthand
 for the full range of characters between those two (inclusive) in  the  collating
 sequence,  for  example, "[0-9]" in ASCII matches any decimal digit.  It is ille‐
 gal(!) for two ranges to share an endpoint, for  example,  "a-c-e".   Ranges  are
 very  collating-sequence-dependent, and portable programs should avoid relying on
 them.

 To include a literal ']' in the list, make it the first  character  (following  a
 possible '^').  To include a literal '-', make it the first or last character, or
 the second endpoint of a range.  To use a literal '-' as the first endpoint of  a
 range,  enclose  it in "[." and ".]"  to make it a collating element (see below).
 With the exception of these and some  combinations  using  '['  (see  next  para‐
 graphs), all other special characters, including '\', lose their special signifi‐
 cance within a bracket expression.

为什么这个 sed 命令输出“[18”而不是“18”？

Why does this sed command output "[18" instead of "18"?

regex

unix

sed

string-substitution