如何编写嵌套的正则表达式来查找某个字符串下方的单词？

Question

I am converting one pdf to text with xpdf and then find some words with help of regex and preg_match_all.

I am seperating my words with colon in pdftotext.

下面是我的 pdftotext 输出：

                                 In respect of Shareholders

Name:                                    xyx

Residential address:                     dublin

No of Shares:                            2

Name:                                    abc

Residential address:                     canada

No of Shares:                            2

所以我写了一个正则表达式，它会在 text() 中显示冒号后的单词。

$regex = '/(?<=: ).+/';
preg_match_all($regex, $string, $matches);

但现在我想要显示 In respect of Shareholders 之后所有数据的正则表达式。

所以，我写 $regex = '/(?<=In respect of Shareholders).*?(?=\s)';

但它只显示了我:

Name:                                    xyx

我想先查找 In respect of shareholders 之后的所有数据，然后再使用另一个正则表达式查找冒号之后的单词。

Answer 1

在您的正则表达式 (?<=: ).+ 中，您将在冒号和 space 之后匹配任何字符 1 次以上。要捕获组中 space 或选项卡之后的所有内容，您可以使用 (?<=: )[\t ](.+)

使用捕获组匹配文本的另一种方法是：

^.*?:[ \t]+(\w+)

说明

^ 断言字符串开始
.*?: 匹配任何非贪心字符后跟 :
[ \t]+ 匹配 1+ 次 space 或制表符
(\w+)组内抓取1+字字符

Regex demo | Php demo

或使用 \K 忘记匹配的内容（如果支持）：

^.*?:\h*\K\w+

Regex demo

Answer 2

您可以使用

if (preg_match_all('~(?:\G(?!\A)|In respect of Shareholders)\s*[^:\r\n]+:\h*\K.*~', $string, $matches)) {
    print_r($matches[0]);
}

见regex demo

详情

(?:\G(?!\A)|In respect of Shareholders) - 上一次成功匹配的结尾或 In respect of Shareholders text
\s* - 0+ 个空格
[^:\n\r]+ - 除了 :、CR 和 LF
: - 冒号
\h* - 0+ 水平空格
\K - 匹配重置运算符，丢弃目前匹配的所有文本
.* - 行的其余部分（除换行字符外的 0 个或更多字符）。

如何编写嵌套的正则表达式来查找某个字符串下方的单词？

How to write nested regex to find words below some string?

regex

preg-match-all