与 $ 元字符相关的 "string-ending newline" 是什么?

What is a "string-ending newline" in relation to the $ metacharacter?

我正在研究 Mastering Regular Expressions, 3rd Edition 的正则表达式,我发现 $ 比 [=] 更复杂 12=],这让我感到惊讶,因为我认为它们是 "symmetrical",除非它们被转义为它们的字面对应物。

事实上,在第 129 页,他们的描述略有不同,用了更多的词来支持 $;但是我仍然对此感到困惑。

Caret ^ matches at the beginning of the text being searched, and, if in an enhaced line-anchor mode, after any newline. [...] $ [...] matches

$ [...] matches at the end of the target string, and before a string-ending newline, as well. The latter is common, to allow an expression like s$ (ostensibly, to match "a line ending with s") to match …s<NL>, a line ending with s that's capped with an ending newline.

Two other common meanings for $ are to match only at the end of the target text, and to match before any newline.

后两个含义似乎与 ^ 中描述的含义相当对称,但是 字符串结尾换行符 的含义如何?

目前搜索[regex] "string-ending newline"只得到one, , and three个结果,全部参考

$ Matches the ending position of the string or the position just before a string-ending newline. In line-based tools, it matches the ending position of any line.

重点是 $ 将匹配 both 在 (a) 换行符之前和文件或输入字符串的末尾,这可以或不能以 (a) 换行符结尾

零宽度断言 $ 断言字符串末尾的位置,或者在字符串末尾的行终止符之前(如果有)。

perl中的这些代码片段会更清楚:

$str = 'abc
foo';
$str =~ s/\w+$/#/;
print "1. <" . $str . ">\n\n";

$str = 'abc
foo
';
$str =~ s/\w+$/#/;
print "2. <" . $str . ">\n\n";

$str = 'abc
foo

';
$str =~ s/\w+$/#/;
print "3. <" . $str . ">\n\n";

这将生成以下输出:

1. <abc
#>

2. <abc
#
>

3. <abc
foo

>

如您所见,$ 匹配大小写 12,因为 $ 匹配字符串的末尾(case 1) or before the line break right at the end (case 2). However case 3 仍然不匹配,因为行break 不在字符串末尾。

请查看以下代码是否有助于阐明正则表达式中 $ 的含义,添加 \n 用于比较

use strict;
use warnings;
use feature 'say';

my $str = 'abc
foo
bar
';

my $str_test;

$str_test = $str;
$str_test =~ s/(.+)$/[]/;
say '-' x 30;
say ' regex :: s/(.+)$/[]/';
say '-' x 30;
say $str_test;

$str_test = $str;
$str_test =~ s/(.+)$/[]/s;
say '-' x 30;
say ' regex :: s/(.+)$/[]/s';
say '-' x 30;
say $str_test;

$str_test = $str;
$str_test =~ s/\n/[NL]\n/s;
say '-' x 30;
say ' regex :: s/\n/[NL]\n/s';
say '-' x 30;
say $str_test;

$str_test = $str;
$str_test =~ s/\n/[NL]\n/g;
say '-' x 30;
say ' regex :: s/\n/[NL]\n/g';
say '-' x 30;
say $str_test;

输出

------------------------------
 regex :: s/(.+)$/[]/
------------------------------
abc
foo
[bar]

------------------------------
 regex :: s/(.+)$/[]/s
------------------------------
[abc
foo
bar
]
------------------------------
 regex :: s/\n/[NL]\n/s
------------------------------
abc[NL]
foo
bar

------------------------------
 regex :: s/\n/[NL]\n/g
------------------------------
abc[NL]
foo[NL]
bar[NL]

"String-ending newline" 表示换行符是字符串的最后一个字符。


没有/m

$ 匹配字符串末尾的换行符之前,以及字符串的末尾。

"abc\ndef\n" =~ /^abc$/           # Doesn't match at embedded line feed
"abc\ndef\n" =~ /^abc\n$/         # Doesn't match after embedded line feed
"abc\ndef\n" =~ /^abc\ndef$/      # Matches at string-ending line feed
"abc\ndef\n" =~ /^abc\ndef\n$/    # Matches at end of string

相当于\Z,相当于(?=\n\z|\z).

/m

$ 匹配换行前和字符串末尾。

"abc\ndef\n" =~ /^abc$/           # Matches at embedded line feed
"abc\ndef\n" =~ /^abc\n$/         # Doesn't match after embedded line feed
"abc\ndef\n" =~ /^abc\ndef$/      # Matches at string-ending line feed
"abc\ndef\n" =~ /^abc\ndef\n$/    # Matches at end of string

相当于(?=\n|\z).


\z 用于精确匹配。

/xyz\z/    # String ends with "xyz"

$ 用于忽略尾随换行。

/xyz$/     # Line ends with "xyz". The string might end with a line feed.

例如,

"jkl"   =~ /^jkl$/     # Matches at end of string
"jkl"   =~ /^jkl\z/    # Matches at end of string

"jkl\n" =~ /^jkl$/     # Matches at string-ending line feed
"jkl\n" =~ /^jkl\z/    # Doesn't match at string-ending line feed

$ 在匹配您尚未选择的行时很有用。

while (<>) {
   next if /^foo$/;
   ...
}

\z 在剩下的时间里很有用。


请注意,其他正则表达式引擎的行为可能不同,即使是那些类似 Perl 的引擎。例如,在 JavaScript 中,没有 /m$ 只匹配字符串的末尾。