哪些字符与 ^ 组合不需要在 sed 中进行转义?

Which characters combined with ^ don't need to be escaped in sed?

我检查过 ^*^& 匹配以 *& 开头的行,但我没有这样做,因为它们是特殊字符。但是 ^[ 不起作用。这是 "standard" 行为吗?这背后有什么道理吗?

sed 使用的版本是 "GNU sed 4.4"。

参见 sed "3.3 Overview of Regular Expression Syntax" documentation

& 字符不是特殊的正则表达式字符,它不需要以正则表达式模式转义。请注意,& 可以解析为 替换 模式中的特殊结构,其中指的是整个匹配项。

* 在 GNU sed 的开头并不特殊(^* 是匹配字符串开头的 * 的模式):

POSIX 1003.1-2001 says that * stands for itself when it appears at the start of a regular expression or subexpression, but many nonGNU implementations do not support this and portable scripts should instead use \* in these contexts.

[ 开始一个括号表达式并且必须有一对 ] 来结束表达式,因此它是一个错误。

来自 POSIX.1-2017:

The sed utility shall support the BREs described in XBD Basic Regular Expressions, ... [sed]

阅读关于 BRE 的 POSIX 部分,我们读到:

A BRE special character has special properties in certain contexts. Outside those contexts, or when preceded by a <backslash>, such a character is a BRE that matches the special character itself. The BRE special characters and the contexts in which they have their special meaning are as follows:

  • .[\: The <period>, <left-square-bracket>, and <backslash> shall be special except when used in a bracket expression (see RE Bracket Expression). An expression containing a '[' that is unescaped and is not part of a bracket expression produces undefined results.
  • *: The <asterisk> shall be special except when used:
    • In a bracket expression
    • As the first character of an entire BRE (after an initial '^', if any)
    • As the first character of a subexpression (after an initial '^', if any); see BREs Matching Multiple Characters
  • ^: The <circumflex> shall be special when used as an anchor (see BRE Expression Anchoring). The <circumflex> shall signify a non-matching list expression when it occurs first in a list, immediately following a <left-square-bracket> (see RE Bracket Expression).
  • $: The <dollar-sign> shall be special when used as an anchor.

source: Basic Regular Expressions, Special characters

因此,使用以上内容回答 OP 问题:

  • & 不是特殊字符,因此 ^& 应该有效
  • [ 如果不用作括号表达式,则应始终进行转义。
  • * 在初始 ^ 之后并不特殊,后者是锚点。

因此,OP 观察到的所有陈述都是有效的。

不过RE Bracket Expression中还有一段很有意思:

A bracket expression is either a matching list expression or a non-matching list expression. It consists of one or more expressions: ordinary characters, collating elements, collating symbols, equivalence classes, character classes, or range expressions. The <right-square-bracket> ( ] ) shall lose its special meaning and represent itself in a bracket expression if it occurs first in the list (after an initial <circumflex>( ^ ), if any). Otherwise, it shall terminate the bracket expression, unless it appears in a collating symbol (such as [.].] ) or is the ending <right-square-bracket> for a collating symbol, equivalence class, or character class. The special characters ., *, [, and \ ( <period>, <asterisk>, <left-square-bracket>, and <backslash>, respectively) shall lose their special meaning within a bracket expression.

source: Basic Regular Expressions, RE Bracket Expression

这意味着 ] 不能在括号表达式中转义。这意味着:

以下作品:

$ echo '[]' | sed 's/[^]x]/a/'
a]
$ echo '[]' | sed 's/[^x[.].]]/a/'
a]

但这并不像预期的那样工作:

$ echo '[]' | sed 's/[^x\]]/a/'
[]

所以在括号表达式中,不要转义它,而是整理它!