哪些字符与 ^ 组合不需要在 sed 中进行转义?
Which characters combined with ^ don't need to be escaped in sed?
我检查过 ^*
和 ^&
匹配以 *
和 &
开头的行,但我没有这样做,因为它们是特殊字符。但是 ^[
不起作用。这是 "standard" 行为吗?这背后有什么道理吗?
sed
使用的版本是 "GNU sed 4.4"。
参见 sed
"3.3 Overview of Regular Expression Syntax" documentation。
&
字符不是特殊的正则表达式字符,它不需要以正则表达式模式转义。请注意,&
可以解析为 替换 模式中的特殊结构,其中指的是整个匹配项。
*
在 GNU sed
的开头并不特殊(^*
是匹配字符串开头的 *
的模式):
POSIX 1003.1-2001 says that *
stands for itself when it appears at the start of a regular expression or subexpression, but many nonGNU implementations do not support this and portable scripts should instead use \*
in these contexts.
[
开始一个括号表达式并且必须有一对 ]
来结束表达式,因此它是一个错误。
来自 POSIX.1-2017:
The sed utility shall support the BREs described in XBD Basic Regular Expressions, ... [sed]
阅读关于 BRE 的 POSIX 部分,我们读到:
A BRE special character has special properties in certain contexts. Outside those contexts, or when preceded by a <backslash>, such a character is a BRE that matches the special character itself. The BRE special characters and the contexts in which they have their special meaning are as follows:
.[\
:
The <period>, <left-square-bracket>, and <backslash> shall be special except when used in a bracket expression (see RE Bracket Expression). An expression containing a '[' that is unescaped and is not part of a bracket expression produces undefined results.
*
:
The <asterisk> shall be special except when used:
- In a bracket expression
- As the first character of an entire BRE (after an initial '^', if any)
- As the first character of a subexpression (after an initial '^', if any); see BREs Matching Multiple Characters
^
:
The <circumflex> shall be special when used as an anchor (see BRE Expression Anchoring). The <circumflex> shall signify a non-matching list expression when it occurs first in a list, immediately following a <left-square-bracket> (see RE Bracket Expression).
$
:
The <dollar-sign> shall be special when used as an anchor.
因此,使用以上内容回答 OP 问题:
&
不是特殊字符,因此 ^&
应该有效
[
如果不用作括号表达式,则应始终进行转义。
*
在初始 ^
之后并不特殊,后者是锚点。
因此,OP 观察到的所有陈述都是有效的。
不过RE Bracket Expression中还有一段很有意思:
A bracket expression is either a matching list expression or a non-matching list expression. It consists of one or more expressions: ordinary characters, collating elements, collating symbols, equivalence classes, character classes, or range expressions. The <right-square-bracket> ( ]
) shall lose its special meaning and represent itself in a bracket expression if it occurs first in the list (after an initial <circumflex>( ^
), if any). Otherwise, it shall terminate the bracket expression, unless it appears in a collating symbol (such as [.].]
) or is the ending <right-square-bracket> for a collating symbol, equivalence class, or character class. The special characters .
, *
, [
, and \
( <period>, <asterisk>, <left-square-bracket>, and <backslash>, respectively) shall lose their special meaning within a bracket expression.
这意味着 ]
不能在括号表达式中转义。这意味着:
以下作品:
$ echo '[]' | sed 's/[^]x]/a/'
a]
$ echo '[]' | sed 's/[^x[.].]]/a/'
a]
但这并不像预期的那样工作:
$ echo '[]' | sed 's/[^x\]]/a/'
[]
所以在括号表达式中,不要转义它,而是整理它!
我检查过 ^*
和 ^&
匹配以 *
和 &
开头的行,但我没有这样做,因为它们是特殊字符。但是 ^[
不起作用。这是 "standard" 行为吗?这背后有什么道理吗?
sed
使用的版本是 "GNU sed 4.4"。
参见 sed
"3.3 Overview of Regular Expression Syntax" documentation。
&
字符不是特殊的正则表达式字符,它不需要以正则表达式模式转义。请注意,&
可以解析为 替换 模式中的特殊结构,其中指的是整个匹配项。
*
在 GNU sed
的开头并不特殊(^*
是匹配字符串开头的 *
的模式):
POSIX 1003.1-2001 says that
*
stands for itself when it appears at the start of a regular expression or subexpression, but many nonGNU implementations do not support this and portable scripts should instead use\*
in these contexts.
[
开始一个括号表达式并且必须有一对 ]
来结束表达式,因此它是一个错误。
来自 POSIX.1-2017:
The sed utility shall support the BREs described in XBD Basic Regular Expressions, ... [sed]
阅读关于 BRE 的 POSIX 部分,我们读到:
A BRE special character has special properties in certain contexts. Outside those contexts, or when preceded by a <backslash>, such a character is a BRE that matches the special character itself. The BRE special characters and the contexts in which they have their special meaning are as follows:
.[\
: The <period>, <left-square-bracket>, and <backslash> shall be special except when used in a bracket expression (see RE Bracket Expression). An expression containing a '[' that is unescaped and is not part of a bracket expression produces undefined results.*
: The <asterisk> shall be special except when used:
- In a bracket expression
- As the first character of an entire BRE (after an initial '^', if any)
- As the first character of a subexpression (after an initial '^', if any); see BREs Matching Multiple Characters
^
: The <circumflex> shall be special when used as an anchor (see BRE Expression Anchoring). The <circumflex> shall signify a non-matching list expression when it occurs first in a list, immediately following a <left-square-bracket> (see RE Bracket Expression).$
: The <dollar-sign> shall be special when used as an anchor.
因此,使用以上内容回答 OP 问题:
&
不是特殊字符,因此^&
应该有效[
如果不用作括号表达式,则应始终进行转义。*
在初始^
之后并不特殊,后者是锚点。
因此,OP 观察到的所有陈述都是有效的。
不过RE Bracket Expression中还有一段很有意思:
A bracket expression is either a matching list expression or a non-matching list expression. It consists of one or more expressions: ordinary characters, collating elements, collating symbols, equivalence classes, character classes, or range expressions. The <right-square-bracket> (
]
) shall lose its special meaning and represent itself in a bracket expression if it occurs first in the list (after an initial <circumflex>(^
), if any). Otherwise, it shall terminate the bracket expression, unless it appears in a collating symbol (such as[.].]
) or is the ending <right-square-bracket> for a collating symbol, equivalence class, or character class. The special characters.
,*
,[
, and\
( <period>, <asterisk>, <left-square-bracket>, and <backslash>, respectively) shall lose their special meaning within a bracket expression.
这意味着 ]
不能在括号表达式中转义。这意味着:
以下作品:
$ echo '[]' | sed 's/[^]x]/a/'
a]
$ echo '[]' | sed 's/[^x[.].]]/a/'
a]
但这并不像预期的那样工作:
$ echo '[]' | sed 's/[^x\]]/a/'
[]
所以在括号表达式中,不要转义它,而是整理它!