如何使用正则表达式在提取的结果中包含换行符

How to use regex to include linebreaks in extracted results

我正在处理一个类似于此的消息文本文件(虽然要长很多):

13/09/18, 4:14 pm - Fred Dag: Jackie, please could you send to me too? ‚ thank you
Hello
13/09/18, 4:45 pm - Jackie Johnson: Here is yet another message
where someone added a line break
13/09/18, 4:10 pm - Fred Dag: Here is another message

以下正则表达式用于将数据提取到日期、时间、名称和消息中 除了,其中消息包含换行符:

(?<date>(?:[0-9]{1,2}\/){2}[0-9]{1,2}),\s(?<time>(?:[0-9]{1,2}:)[0-9]{2}\s[a|p]m)\s-\s(?<name>(?:.*)):\s(?<message>(?:.+))

使用 preg_match_all 和上面的正则表达式,在 php7.4 中生成了以下数组:

Array
(
    [0] => Array
        (
            [date] => 13/09/18
            [time] => 4:14 pm
            [name] => Fred Dag
            [message] => Jackie, please could you send to me too? ‚ thank you
        )

    [1] => Array
        (
            [date] => 13/09/18
            [time] => 4:45 pm
            [name] => Jackie Johnson
            [message] => Here is yet another message
        )

    [2] => Array
        (
            [date] => 13/09/18
            [time] => 4:10 pm
            [name] => Fred Dag
            [message] => Here is another message
        )

)

但是数组缺少由换行符引起的行,这些行应该附加到上一个消息中。我在 regex101.com.

中播放时得到相同的结果

我想我已经用尽了我对正则表达式的了解,并用我知道的搜索词到达了 Google 的结尾:) 谁能指出我正确的方向?

您的直接问题似乎是您用来匹配邮件内容的点与换行符不匹配。这可以通过在 PHP 正则表达式中使用 /s dot all 标志轻松解决。但除此之外,我认为您的正则表达式也需要更改。我建议采用以下模式:

\d{2}\/\d{2}\/\d{2}, \d{1,2}:\d{1,2}.*?(?=\d{2}\/\d{2}\/\d{2}, \d{1,2}:\d{1,2}|$)

此模式匹配从开始日期开始的一行,跨越换行符,直到到达下一条消息的开头或输入的结尾。

示例脚本:

$input = "13/09/18, 4:14 pm - Fred Dag: Jackie, please could you send to me too? ‚ thank you\nHello\n13/09/18, 4:45 pm - Jackie Johnson: Here is yet another message\nwhere someone added a line break\n13/09/18, 4:10 pm - Fred Dag: Here is another message";
preg_match_all("/\d{2}\/\d{2}\/\d{2}, \d{1,2}:\d{1,2}.*?(?=\d{2}\/\d{2}\/\d{2}, \d{1,2}:\d{1,2}|$)/s", $input, $matches);
print_r($matches[0]);

这会打印:

Array
(
    [0] => 13/09/18, 4:14 pm - Fred Dag: Jackie, please could you send to me too? ‚ thank you
    Hello

    [1] => 13/09/18, 4:45 pm - Jackie Johnson: Here is yet another message
    where someone added a line break

    [2] => 13/09/18, 4:10 pm - Fred Dag: Here is another message
)