PHP 正则表达式匹配行尾的差异
Difference in matching end of line with PHP regex
给定代码:
$my_str = '
Rollo is*
My dog*
And he\'s very*
Lovely*
';
preg_match_all('/\S+(?=\*$)/m', $my_str, $end_words);
print_r($end_words);
在 PHP 7.3.2 (XAMPP) 中我得到了意外的输出
Array ( [0] => Array ( ) )
而在 PhpFiddle 中,在 PHP 7.0.33 上,我得到了预期的结果:
Array ( [0] => Array ( [0] => is [1] => dog [2] => very [3] => Lovely ) )
为什么我会得到这种差异? 7.0.33 之后正则表达式的行为有什么变化吗?
看来在你的环境下,PCRE库编译时没有带PCRE_NEWLINE_ANY
选项,多行模式下的$
只匹配LF符号和.
匹配除 LF 以外的任何符号。
您可以使用 PCRE (*ANYCRLF)
动词修复它:
'~(*ANYCRLF)\S+(?=\*$)~m'
(*ANYCRLF)
指定换行符约定:(*CR)
、(*LF)
或 (*CRLF)
等同于 PCRE_NEWLINE_ANY
选项。见 PCRE documentation:
PCRE_NEWLINE_ANY
specifies that any Unicode newline sequence should be recognized.
最后,这个 PCRE 动词使 .
可以匹配任何字符 但是 CR 和 LF 符号和 $
将匹配任何一个字符之前这两个字符。
在 rexegg.com 查看更多关于这个和其他动词的信息:
By default, when PCRE is compiled, you tell it what to consider to be a line break when encountering a .
(as the dot it doesn't match line breaks unless in dotall mode), as well the ^
and $
anchors' behavior in multiline mode. You can override this default with the following modifiers:
✽ (*CR)
Only a carriage return is considered to be a line break
✽ (*LF)
Only a line feed is considered to be a line break (as on Unix)
✽ (*CRLF)
Only a carriage return followed by a line feed is considered to be a line break (as on Windows)
✽ (*ANYCRLF)
Any of the above three is considered to be a line break
✽ (*ANY)
Any Unicode newline sequence is considered to be a line break
For instance, (*CR)\w+.\w+
matches Line1\nLine2 because the dot is able to match the \n, which is not considered to be a line break. See the demo.
给定代码:
$my_str = '
Rollo is*
My dog*
And he\'s very*
Lovely*
';
preg_match_all('/\S+(?=\*$)/m', $my_str, $end_words);
print_r($end_words);
在 PHP 7.3.2 (XAMPP) 中我得到了意外的输出
Array ( [0] => Array ( ) )
而在 PhpFiddle 中,在 PHP 7.0.33 上,我得到了预期的结果:
Array ( [0] => Array ( [0] => is [1] => dog [2] => very [3] => Lovely ) )
为什么我会得到这种差异? 7.0.33 之后正则表达式的行为有什么变化吗?
看来在你的环境下,PCRE库编译时没有带PCRE_NEWLINE_ANY
选项,多行模式下的$
只匹配LF符号和.
匹配除 LF 以外的任何符号。
您可以使用 PCRE (*ANYCRLF)
动词修复它:
'~(*ANYCRLF)\S+(?=\*$)~m'
(*ANYCRLF)
指定换行符约定:(*CR)
、(*LF)
或 (*CRLF)
等同于 PCRE_NEWLINE_ANY
选项。见 PCRE documentation:
PCRE_NEWLINE_ANY
specifies that any Unicode newline sequence should be recognized.
最后,这个 PCRE 动词使 .
可以匹配任何字符 但是 CR 和 LF 符号和 $
将匹配任何一个字符之前这两个字符。
在 rexegg.com 查看更多关于这个和其他动词的信息:
By default, when PCRE is compiled, you tell it what to consider to be a line break when encountering a
.
(as the dot it doesn't match line breaks unless in dotall mode), as well the^
and$
anchors' behavior in multiline mode. You can override this default with the following modifiers:✽
(*CR)
Only a carriage return is considered to be a line break
✽(*LF)
Only a line feed is considered to be a line break (as on Unix)
✽(*CRLF)
Only a carriage return followed by a line feed is considered to be a line break (as on Windows)
✽(*ANYCRLF)
Any of the above three is considered to be a line break
✽(*ANY)
Any Unicode newline sequence is considered to be a line breakFor instance,
(*CR)\w+.\w+
matches Line1\nLine2 because the dot is able to match the \n, which is not considered to be a line break. See the demo.