收集重复的 2 组模式

Question

我正在寻找一个 return 来自以下

的正则表达式

The law of Huxley Something interesting. Some other interesting thing. The law of Dallas This thing is boring. The law of void Some stuff.

作为 2 行文本，其中已标识 2 个组：

以“The law”开头并以第一个大写字母结尾的第一组；
第二组在下一个第一组“法律”时开始遇到模式。

我的目标是通过使用像这样的捕获组将标题与核心文本分开来重新措辞：

The law of Huxley 
Something interesting. Some other interesting thing. 

The law of Dallas 
This thing is boring.

The law of void
Some stuff.

我试过

((The law [\w\s]+)([A-Z].+))+

无果

Answer 1

你可以使用

(The law\s+\w+\s\P{Lu}*)(\p{Lu}.*?)(?=The law|$)

参见regex demo。

详情:

(The law\s+\w+\s\P{Lu}*) - 第 1 组：The law 文本，然后是一个或多个空格、一个或多个单词字符、一个空格，然后是大写字母以外的任何零个或多个字符
(\p{Lu}.*?) - 第 2 组：一个大写字母，然后是除换行符以外的任何零个或多个字符，尽可能少，直到后续子模式第一次出现为止
(?=The law|$) - 正前瞻要求 The law 或字符串结尾紧接在当前位置的右侧。

收集重复的 2 组模式

Gather a repeating 2-group pattern

regex

pcre