与 $ 元字符相关的 "string-ending newline" 是什么？

Question

我正在研究 Mastering Regular Expressions, 3rd Edition 的正则表达式，我发现 $ 比 [=] 更复杂 12=]，这让我感到惊讶，因为我认为它们是 "symmetrical"，除非它们被转义为它们的字面对应物。

事实上，在第 129 页，他们的描述略有不同，用了更多的词来支持 $；但是我仍然对此感到困惑。

关于^，只描述了两个明确的备选方案：

Caret ^ matches at the beginning of the text being searched, and, if in an enhaced line-anchor mode, after any newline. [...] $ [...] matches

关于$，描述对我来说比较晦涩：

$ [...] matches at the end of the target string, and before a string-ending newline, as well. The latter is common, to allow an expression like s$ (ostensibly, to match "a line ending with s") to match …s<NL>, a line ending with s that's capped with an ending newline.

Two other common meanings for $ are to match only at the end of the target text, and to match before any newline.

后两个含义似乎与 ^ 中描述的含义相当对称，但是 字符串结尾换行符 的含义如何？

目前搜索[regex] "string-ending newline"只得到one, , and three个结果，全部参考

$ Matches the ending position of the string or the position just before a string-ending newline. In line-based tools, it matches the ending position of any line.

Answer 1

重点是 $ 将匹配 both 在 (a) 换行符之前和文件或输入字符串的末尾，这可以或不能以 (a) 换行符结尾

Answer 2

零宽度断言 $ 断言字符串末尾的位置，或者在字符串末尾的行终止符之前（如果有）。

perl中的这些代码片段会更清楚：

$str = 'abc
foo';
$str =~ s/\w+$/#/;
print "1. <" . $str . ">\n\n";

$str = 'abc
foo
';
$str =~ s/\w+$/#/;
print "2. <" . $str . ">\n\n";

$str = 'abc
foo

';
$str =~ s/\w+$/#/;
print "3. <" . $str . ">\n\n";

这将生成以下输出：

1. <abc
#>

2. <abc
#
>

3. <abc
foo

>

如您所见，$ 匹配大小写 1 和 2，因为 $ 匹配字符串的末尾（case 1) or before the line break right at the end (case 2). However case 3 仍然不匹配，因为行break 不在字符串末尾。

Answer 3

请查看以下代码是否有助于阐明正则表达式中 $ 的含义，添加 \n 用于比较

use strict;
use warnings;
use feature 'say';

my $str = 'abc
foo
bar
';

my $str_test;

$str_test = $str;
$str_test =~ s/(.+)$/[]/;
say '-' x 30;
say ' regex :: s/(.+)$/[]/';
say '-' x 30;
say $str_test;

$str_test = $str;
$str_test =~ s/(.+)$/[]/s;
say '-' x 30;
say ' regex :: s/(.+)$/[]/s';
say '-' x 30;
say $str_test;

$str_test = $str;
$str_test =~ s/\n/[NL]\n/s;
say '-' x 30;
say ' regex :: s/\n/[NL]\n/s';
say '-' x 30;
say $str_test;

$str_test = $str;
$str_test =~ s/\n/[NL]\n/g;
say '-' x 30;
say ' regex :: s/\n/[NL]\n/g';
say '-' x 30;
say $str_test;

输出

------------------------------
 regex :: s/(.+)$/[]/
------------------------------
abc
foo
[bar]

------------------------------
 regex :: s/(.+)$/[]/s
------------------------------
[abc
foo
bar
]
------------------------------
 regex :: s/\n/[NL]\n/s
------------------------------
abc[NL]
foo
bar

------------------------------
 regex :: s/\n/[NL]\n/g
------------------------------
abc[NL]
foo[NL]
bar[NL]

Answer 4

"String-ending newline" 表示换行符是字符串的最后一个字符。

没有`/m`

$ 匹配字符串末尾的换行符之前，以及字符串的末尾。

"abc\ndef\n" =~ /^abc$/           # Doesn't match at embedded line feed
"abc\ndef\n" =~ /^abc\n$/         # Doesn't match after embedded line feed
"abc\ndef\n" =~ /^abc\ndef$/      # Matches at string-ending line feed
"abc\ndef\n" =~ /^abc\ndef\n$/    # Matches at end of string

相当于\Z，相当于(?=\n\z|\z).

和`/m`

$ 匹配换行前和字符串末尾。

"abc\ndef\n" =~ /^abc$/           # Matches at embedded line feed
"abc\ndef\n" =~ /^abc\n$/         # Doesn't match after embedded line feed
"abc\ndef\n" =~ /^abc\ndef$/      # Matches at string-ending line feed
"abc\ndef\n" =~ /^abc\ndef\n$/    # Matches at end of string

相当于(?=\n|\z).

\z 用于精确匹配。

/xyz\z/    # String ends with "xyz"

$ 用于忽略尾随换行。

/xyz$/     # Line ends with "xyz". The string might end with a line feed.

例如，

"jkl"   =~ /^jkl$/     # Matches at end of string
"jkl"   =~ /^jkl\z/    # Matches at end of string

"jkl\n" =~ /^jkl$/     # Matches at string-ending line feed
"jkl\n" =~ /^jkl\z/    # Doesn't match at string-ending line feed

$ 在匹配您尚未选择的行时很有用。

while (<>) {
   next if /^foo$/;
   ...
}

\z 在剩下的时间里很有用。

请注意，其他正则表达式引擎的行为可能不同，即使是那些类似 Perl 的引擎。例如，在 JavaScript 中，没有 /m 的 $ 只匹配字符串的末尾。

与 $ 元字符相关的 "string-ending newline" 是什么？

What is a "string-ending newline" in relation to the $ metacharacter?

regex

perl

newline

eol

dollar-sign

没有`/m`

和`/m`

与 $ 元字符相关的 "string-ending newline" 是什么？

What is a "string-ending newline" in relation to the $ metacharacter?

regex

perl

newline

eol

dollar-sign

没有/m

和/m

没有`/m`

和`/m`