与 $ 元字符相关的 "string-ending newline" 是什么?
What is a "string-ending newline" in relation to the $ metacharacter?
我正在研究 Mastering Regular Expressions, 3rd Edition 的正则表达式,我发现 $
比 [=] 更复杂 12=],这让我感到惊讶,因为我认为它们是 "symmetrical",除非它们被转义为它们的字面对应物。
事实上,在第 129 页,他们的描述略有不同,用了更多的词来支持 $
;但是我仍然对此感到困惑。
- 关于
^
,只描述了两个明确的备选方案:
Caret ^
matches at the beginning of the text being searched, and, if in an enhaced line-anchor mode, after any newline. [...] $
[...] matches
- 关于
$
,描述对我来说比较晦涩:
$
[...] matches at the end of the target string, and before a string-ending newline, as well. The latter is common, to allow an expression like s$
(ostensibly, to match "a line ending with s
") to match …s<NL>
, a line ending with s
that's capped with an ending newline.
Two other common meanings for $
are to match only at the end of the target text, and to match before any newline.
后两个含义似乎与 ^
中描述的含义相当对称,但是 字符串结尾换行符 的含义如何?
目前搜索[regex] "string-ending newline"
只得到one, , and three个结果,全部参考
$
Matches the ending position of the string or the position just before a string-ending newline. In line-based tools, it matches the ending position of any line.
重点是 $
将匹配 both 在 (a) 换行符之前和文件或输入字符串的末尾,这可以或不能以 (a) 换行符结尾
零宽度断言 $
断言字符串末尾的位置,或者在字符串末尾的行终止符之前(如果有)。
perl
中的这些代码片段会更清楚:
$str = 'abc
foo';
$str =~ s/\w+$/#/;
print "1. <" . $str . ">\n\n";
$str = 'abc
foo
';
$str =~ s/\w+$/#/;
print "2. <" . $str . ">\n\n";
$str = 'abc
foo
';
$str =~ s/\w+$/#/;
print "3. <" . $str . ">\n\n";
这将生成以下输出:
1. <abc
#>
2. <abc
#
>
3. <abc
foo
>
如您所见,$
匹配大小写 1
和 2
,因为 $
匹配字符串的末尾(case 1) or before the line break right at the end (case 2). However case 3 仍然不匹配,因为行break 不在字符串末尾。
请查看以下代码是否有助于阐明正则表达式中 $
的含义,添加 \n
用于比较
use strict;
use warnings;
use feature 'say';
my $str = 'abc
foo
bar
';
my $str_test;
$str_test = $str;
$str_test =~ s/(.+)$/[]/;
say '-' x 30;
say ' regex :: s/(.+)$/[]/';
say '-' x 30;
say $str_test;
$str_test = $str;
$str_test =~ s/(.+)$/[]/s;
say '-' x 30;
say ' regex :: s/(.+)$/[]/s';
say '-' x 30;
say $str_test;
$str_test = $str;
$str_test =~ s/\n/[NL]\n/s;
say '-' x 30;
say ' regex :: s/\n/[NL]\n/s';
say '-' x 30;
say $str_test;
$str_test = $str;
$str_test =~ s/\n/[NL]\n/g;
say '-' x 30;
say ' regex :: s/\n/[NL]\n/g';
say '-' x 30;
say $str_test;
输出
------------------------------
regex :: s/(.+)$/[]/
------------------------------
abc
foo
[bar]
------------------------------
regex :: s/(.+)$/[]/s
------------------------------
[abc
foo
bar
]
------------------------------
regex :: s/\n/[NL]\n/s
------------------------------
abc[NL]
foo
bar
------------------------------
regex :: s/\n/[NL]\n/g
------------------------------
abc[NL]
foo[NL]
bar[NL]
"String-ending newline" 表示换行符是字符串的最后一个字符。
没有/m
$
匹配字符串末尾的换行符之前,以及字符串的末尾。
"abc\ndef\n" =~ /^abc$/ # Doesn't match at embedded line feed
"abc\ndef\n" =~ /^abc\n$/ # Doesn't match after embedded line feed
"abc\ndef\n" =~ /^abc\ndef$/ # Matches at string-ending line feed
"abc\ndef\n" =~ /^abc\ndef\n$/ # Matches at end of string
相当于\Z
,相当于(?=\n\z|\z)
.
和/m
$
匹配换行前和字符串末尾。
"abc\ndef\n" =~ /^abc$/ # Matches at embedded line feed
"abc\ndef\n" =~ /^abc\n$/ # Doesn't match after embedded line feed
"abc\ndef\n" =~ /^abc\ndef$/ # Matches at string-ending line feed
"abc\ndef\n" =~ /^abc\ndef\n$/ # Matches at end of string
相当于(?=\n|\z)
.
\z
用于精确匹配。
/xyz\z/ # String ends with "xyz"
$
用于忽略尾随换行。
/xyz$/ # Line ends with "xyz". The string might end with a line feed.
例如,
"jkl" =~ /^jkl$/ # Matches at end of string
"jkl" =~ /^jkl\z/ # Matches at end of string
"jkl\n" =~ /^jkl$/ # Matches at string-ending line feed
"jkl\n" =~ /^jkl\z/ # Doesn't match at string-ending line feed
$
在匹配您尚未选择的行时很有用。
while (<>) {
next if /^foo$/;
...
}
\z
在剩下的时间里很有用。
请注意,其他正则表达式引擎的行为可能不同,即使是那些类似 Perl 的引擎。例如,在 JavaScript 中,没有 /m
的 $
只匹配字符串的末尾。
我正在研究 Mastering Regular Expressions, 3rd Edition 的正则表达式,我发现 $
比 [=] 更复杂 12=],这让我感到惊讶,因为我认为它们是 "symmetrical",除非它们被转义为它们的字面对应物。
事实上,在第 129 页,他们的描述略有不同,用了更多的词来支持 $
;但是我仍然对此感到困惑。
- 关于
^
,只描述了两个明确的备选方案:
Caret
^
matches at the beginning of the text being searched, and, if in an enhaced line-anchor mode, after any newline. [...]$
[...] matches
- 关于
$
,描述对我来说比较晦涩:
$
[...] matches at the end of the target string, and before a string-ending newline, as well. The latter is common, to allow an expression likes$
(ostensibly, to match "a line ending withs
") to match…s<NL>
, a line ending withs
that's capped with an ending newline.Two other common meanings for
$
are to match only at the end of the target text, and to match before any newline.
后两个含义似乎与 ^
中描述的含义相当对称,但是 字符串结尾换行符 的含义如何?
目前搜索[regex] "string-ending newline"
只得到one,
$
Matches the ending position of the string or the position just before a string-ending newline. In line-based tools, it matches the ending position of any line.
重点是 $
将匹配 both 在 (a) 换行符之前和文件或输入字符串的末尾,这可以或不能以 (a) 换行符结尾
零宽度断言 $
断言字符串末尾的位置,或者在字符串末尾的行终止符之前(如果有)。
perl
中的这些代码片段会更清楚:
$str = 'abc
foo';
$str =~ s/\w+$/#/;
print "1. <" . $str . ">\n\n";
$str = 'abc
foo
';
$str =~ s/\w+$/#/;
print "2. <" . $str . ">\n\n";
$str = 'abc
foo
';
$str =~ s/\w+$/#/;
print "3. <" . $str . ">\n\n";
这将生成以下输出:
1. <abc
#>
2. <abc
#
>
3. <abc
foo
>
如您所见,$
匹配大小写 1
和 2
,因为 $
匹配字符串的末尾(case 1) or before the line break right at the end (case 2). However case 3 仍然不匹配,因为行break 不在字符串末尾。
请查看以下代码是否有助于阐明正则表达式中 $
的含义,添加 \n
用于比较
use strict;
use warnings;
use feature 'say';
my $str = 'abc
foo
bar
';
my $str_test;
$str_test = $str;
$str_test =~ s/(.+)$/[]/;
say '-' x 30;
say ' regex :: s/(.+)$/[]/';
say '-' x 30;
say $str_test;
$str_test = $str;
$str_test =~ s/(.+)$/[]/s;
say '-' x 30;
say ' regex :: s/(.+)$/[]/s';
say '-' x 30;
say $str_test;
$str_test = $str;
$str_test =~ s/\n/[NL]\n/s;
say '-' x 30;
say ' regex :: s/\n/[NL]\n/s';
say '-' x 30;
say $str_test;
$str_test = $str;
$str_test =~ s/\n/[NL]\n/g;
say '-' x 30;
say ' regex :: s/\n/[NL]\n/g';
say '-' x 30;
say $str_test;
输出
------------------------------
regex :: s/(.+)$/[]/
------------------------------
abc
foo
[bar]
------------------------------
regex :: s/(.+)$/[]/s
------------------------------
[abc
foo
bar
]
------------------------------
regex :: s/\n/[NL]\n/s
------------------------------
abc[NL]
foo
bar
------------------------------
regex :: s/\n/[NL]\n/g
------------------------------
abc[NL]
foo[NL]
bar[NL]
"String-ending newline" 表示换行符是字符串的最后一个字符。
没有/m
$
匹配字符串末尾的换行符之前,以及字符串的末尾。
"abc\ndef\n" =~ /^abc$/ # Doesn't match at embedded line feed
"abc\ndef\n" =~ /^abc\n$/ # Doesn't match after embedded line feed
"abc\ndef\n" =~ /^abc\ndef$/ # Matches at string-ending line feed
"abc\ndef\n" =~ /^abc\ndef\n$/ # Matches at end of string
相当于\Z
,相当于(?=\n\z|\z)
.
和/m
$
匹配换行前和字符串末尾。
"abc\ndef\n" =~ /^abc$/ # Matches at embedded line feed
"abc\ndef\n" =~ /^abc\n$/ # Doesn't match after embedded line feed
"abc\ndef\n" =~ /^abc\ndef$/ # Matches at string-ending line feed
"abc\ndef\n" =~ /^abc\ndef\n$/ # Matches at end of string
相当于(?=\n|\z)
.
\z
用于精确匹配。
/xyz\z/ # String ends with "xyz"
$
用于忽略尾随换行。
/xyz$/ # Line ends with "xyz". The string might end with a line feed.
例如,
"jkl" =~ /^jkl$/ # Matches at end of string
"jkl" =~ /^jkl\z/ # Matches at end of string
"jkl\n" =~ /^jkl$/ # Matches at string-ending line feed
"jkl\n" =~ /^jkl\z/ # Doesn't match at string-ending line feed
$
在匹配您尚未选择的行时很有用。
while (<>) {
next if /^foo$/;
...
}
\z
在剩下的时间里很有用。
请注意,其他正则表达式引擎的行为可能不同,即使是那些类似 Perl 的引擎。例如,在 JavaScript 中,没有 /m
的 $
只匹配字符串的末尾。