Perl 正则表达式锚点 $ 如何实际处理尾随换行符？

Question

我最近在 Perl 正则表达式（OpenSuse 15.2 上的 Perl 5.26.1 x86_64）中发现字符串结尾锚 $ 的一些意外行为。

据推测，$ 指的是 字符串 的结尾，而不是行的结尾在 grep(1) 中。因此，必须显式匹配字符串末尾的显式 \n 。但是，以下（完整）程序：

my @strings = ( 
  "hello world",
  "hello world\n",
  "hello world\t"
);
my $i = 0;
foreach (@strings) {
  $i++;
  print "$i: >>$_<<\n" if /d$/;
}

产生这个输出：

1: >>hello world<<
2: >>hello world
<<

即 /d$/ 不仅匹配三个字符串中的第一个字符串，还匹配第二个字符串及其尾随的换行符。另一方面，正如预期的那样，正则表达式 /d\n$/ 仅匹配第二个字符串，而 /d\s$/ 匹配第二个和第三个。

这是怎么回事？

Answer 1

perlre $ 元字符的状态：

Match the end of the string
(or before newline at the end of the string;

这意味着 d 紧接着 \n（换行符）将匹配正则表达式。

Answer 2

如前所述，$ metacharacter indeed matches the end of string, but allowing for a newline so matching before a newline at the end of string as well. Note that it also matches before internal newlines in a multiline string with the /m global modifier

还有一些方法可以微调完全匹配的内容，使用 these assertions

\z 仅匹配字符串的末尾，即使带有 /m 标志，但不在末尾的换行符之前
\Z 只匹配字符串的结尾，即使带有 /m 标志，也匹配字符串结尾的换行符之前。所以像 $ 除了它从不匹配（之前）多行字符串内部的换行符，甚至 /m

这些“零宽度”断言匹配的是位置，而不是字符。

Perl 正则表达式锚点 $ 如何实际处理尾随换行符？

How does Perl regexp anchor $ actually handle a trailing newline?

regex

perl

end-of-line