如何使 PHP Regex Option-Group 不急切?

How to make a PHP Regex Option-Group not eager?

我有一个正则表达式,用于查找以 ngram 选项组结尾的模式。这是正则表达式:

$regex = '/.{0,150}\b(is (.{0,50}?)\b(assembler|builder|consulter|contracter|contractor|contract manufacturer|converter|designer|distributer|distributor|engineerer|fabricater|fabricator|formulater|formulator|installer|machiner|manufacturer|offerer|producer|provider|reseller|seller|supplier|wholesaler|machine shop|job shop|law firm|marketer|marketing agency))\b([^.!?<>]{0,150})\b/'

这是我匹配的字符串:

$string = 'ABC Company Inc. is a Distributor, Fabricator, and Manufacturer of textiles. Another sentence.';

目标是使用正则表达式的第一个捕获组提取 "is a Distributor, Fabricator, and Manufacturer"。正则表达式的其余部分只是定义上下文,理想情况下,通常在句子末尾或一定长度后结束。

现在,我的第一个捕获组很急切并且只匹配 "is a Distributor"。我怎样才能让它不急切?

.{0,150}\b(is (.{0,50}?)\b(assembler|builder|consulter|contracter|contractor|contract manufacturer|converter|designer|distributer|distributor|engineerer|fabricater|fabricator|formulater|formulator|installer|machiner|manufacturer|offerer|producer|provider|reseller|seller|supplier|wholesaler|machine shop|job shop|law firm|marketer|marketing agency)(.*?\b(assembler|builder|consulter|contracter|contractor|contract manufacturer|converter|designer|distributer|distributor|engineerer|fabricater|fabricator|formulater|formulator|installer|machiner|manufacturer|offerer|producer|provider|reseller|seller|supplier|wholesaler|machine shop|job shop|law firm|marketer|marketing agency))*)\b([^.!?<>]{0,150})\b

这个超长的正则表达式可以做到这一点。查看演示。

https://regex101.com/r/sJ9gM7/39

没有重复的更短版本(不在代码标签中,因为一行不可读):

.{0,150}\b(is([^.!?<>]{0,50}(assembler|builder|consulter|contracter|contractor|contract manufacturer|converter|designer|distributer|distributor|engineerer|fabricater|fabricator|formulater|formulator|installer|machiner|manufacturer|offerer|producer|provider|reseller|seller|supplier|wholesaler|machine shop|job shop|law firm|marketer|marketing agency))+)\b([^.!?<>]{0,150}\b)

想法是允许每个关键字前的前缀不超过 50 个符号(幸运的是,只有一个这样的常量,所以很容易找到),无论它是否是枚举中的另一个关键字。为了捕获枚举,我在关键字列表后添加了 +)

勾选here