为什么这个多行正则表达式包含以下行?

Why does this multi-line regular expression include the following line?

我有以下输入,我想编写一个正则表达式来匹配除了第一行和最后一行之外的每一行。

2019-03-13 00:33:44,846 [INFO] -:  foo
2019-03-13 00:33:45,096 [INFO] -:  Exception sending email
To:
[foo@bar.com, bar@bar.com]
CC:
[baz@bar.com]
Subject:
some subject
Body:
some

body
2019-03-13 00:33:45,190 [INFO] -:  bar

我认为以下应该有效,但它不匹配任何内容:

pcregrep -M ".+Exception sending email[\S\s]+?(?=\d{4}(-\d\d){2})" ~/test.log

用简单的英语,我会将其描述为:查找带有异常文本的行,然后非贪婪地查找任何字符(包括换行符),直到我们为日期找到一个积极的前瞻。

出于某种原因,这也包括最后一行,即使 it doesn't on regex101。我在这里错过了什么?


在很多情况下,我会在这种情况下使用 grep -A,但问题是正文可以是任意行数。

几乎可以肯定与工具有关。作为 changelog for pcregrep states under "Version 8.12 15-Jan-2011" :

  1. In pcregrep, when a pattern that ended with a literal newline sequence was matched in multiline mode, the following line was shown as part of the match. This seems wrong, so I have changed it.

一个简单的解决方法是在先行表达式中添加一个换行符,这会将其从匹配项中拉出并防止显示最后一行:

pcregrep -M ".+Exception sending email[\S\s]+?(?=[\r\n]\d{4}(-\d\d){2})" ~/test.log