PHP Lookaround：获取所有文本，直到找到某个字符串

Question

我想获取文本，直到它找到特定的匹配项。

例如：

我想在找到单词之前获取所有文本 the

目前我有这条规则/([[:alnum:]|\s|.]*)(?!the)/ui

加上这段文字：

this is completely customizable through the dashboard. This is a separate area from the main c

问题是第一组匹配整行并且在找到单词 the 时不会停止。我期望的是：

匹配 1：this is completely customizable through
第 2 场：dashboard. This is a separate area from

我做错了什么？

这里是sample

Answer 1

对 *? 使用非贪婪而不是仅 *。

像这样：

.*?(?=the)

比较这个 .*?(?=the)

有了这个 .*(?=the)

Answer 2

您只需要使用 延迟匹配 和前瞻性：

/.+?(?=\bthe\b)/s

见regex demo，匹配项为

this is completely customizable through 
the dashboard. This is a separate area from

s 修饰符也用于强制 . 匹配换行符。惰性匹配意味着它将搜索最接近的 the 和 \b 帮助找到整个单词 the，而不是单词 theater.

的一部分

惰性匹配，如 rexegg.com 所述：

The lazy .*? guarantees that the quantified dot only matches as many characters as needed for the rest of the pattern to succeed.

您的 ([[:alnum:]|\s|.]*) 正则表达式有点错误，因为字符 class 中的 | 被视为文字管道符号。此外，. 包括 [:alnum:]，因此，它是多余的。您可以将其写成 ([\s.]*)，或者只是 .* 加上 /s（dotall，singleline）修饰符。但由于它是贪心的（即在查找匹配项时匹配尽可能多的字符），它只会在最后的 the 之前停止。因此，您需要使用 *? 而不是 * - 惰性匹配。

由于您可能对空匹配不感兴趣，因此 *（出现 0 次或多次）应替换为 +（前面的子模式出现 1 次或多次）。

这里是PHP demo:

$re = '/.+?(?=\bthe\b)/s'; 
$str = "this is completely customizable through the dashboard. This is a separate area from the main c"; 
preg_match_all($re, $str, $matches);
print_r($matches[0]);

Answer 3

你应该使用非贪婪修饰符 U 大写

此外 - 尝试在第二个闭包中仅使用 "the"

/([[:alnum:]|\s|.]*)(the)/Ui

看看这个

https://regex101.com/r/cF3iK0/1

Answer 4

既然要排除单词the，最好的方法是拆分字符串而不是尝试匹配所有直到这个单词：

$result = preg_split('~\bthe\b~i', $str);
array_pop($result);
print_r($result);

您需要删除带有 array_pop 的最后一项，因为它后面没有 the。

顺便说一句 (?!...) 表示 后面没有 ，(?=...) 表示 后面有 。

PHP Lookaround：获取所有文本，直到找到某个字符串

PHP Lookaround: Get all text until it finds a certain string

php

regex

regex-lookarounds