用于解析退回电子邮件中 "Diagnostic-Code" 的正则表达式

Question

我正在尝试通过 PHP 连接到 IMAP 帐户并获取所有电子邮件来阅读退回的电子邮件。我正在寻找检索每封电子邮件的 "Diagnostic-Code" 消息，我编写了以下正则表达式：

/Diagnostic-Code:\s+?(.*)/i

我要解析的消息是这样的：

Diagnostic-Code: smtp; 550-5.1.1 The email account that you tried to reach does
    not exist. Please try 550-5.1.1 double-checking the recipient's email
    address for typos or 550-5.1.1 unnecessary spaces. Learn more at 550 5.1.1
    https://support.google.com/mail/?p=NoSuchUser 63si4621095ybi.465 - gsmtp

正则表达式的工作部分意味着它只获取第一行文本。我希望能够获取整条消息，所以所有四行文本。

是否可以更新表达式以进行此匹配？

谢谢。

Answer 1

添加s标志：

/Diagnostic-Code:\s+?(.*)/si

来自this question：

In PHP... [t]he s at the end causes the dot to match all characters including newlines.

这将使您的正则表达式匹配整个内容（参见 this regex101）。如果之后有更多文本，请记住添加一些结束方式。

Answer 2

/Diagnostic-Code:\s(.*\n(?:(?!--).*\n)*)/i

结果将在捕获组 1 中
first .*\n 匹配第一行，包括结尾的换行符
(?:(?!--).*\n)* 匹配不以“--”开头的后续行

Answer 3

如果可以有多条消息以 Diagnostic-Code: 开头，您可以使用：

^Diagnostic-Code:\K.*(?:\R(?!Diagnostic-Code:).*)*

见regex demo | Php demo

说明

^ 字符串开头
Diagnostic-Code:字面匹配
\K.* 忘记匹配的内容并跟随字符串的其余部分
(?: 非 capturin 组
- \R(?!Diagnostic-Code:).* 匹配 unicode 换行序列，然后进行否定前瞻以检查后面的内容不是 !Diagnostic-Code:。如果是这种情况，则匹配整个字符串
)*关闭非捕获组并重复0+次

用于解析退回电子邮件中 "Diagnostic-Code" 的正则表达式

RegEx for parsing "Diagnostic-Code" in a bounced e-mail

php

regex

email-bounces

用于解析退回电子邮件中 ​​"Diagnostic-Code" 的正则表达式

RegEx for parsing "Diagnostic-Code" in a bounced e-mail

php

regex

email-bounces

用于解析退回电子邮件中 "Diagnostic-Code" 的正则表达式