Select 句子中的一些字符串

Question

我是正则表达式的新手，想要 select 一些遵循规则的字符串：

Select "beautiful"(zero or one) and "people00" or "peopleXXX" if matched.

句子：

"I am beautiful, charming and friendly people00"
"I am beautiful, charming and friendly peopleXXX"
"I am charming and friendly people00"
"I am charming and friendly peopleXXX"

现在我使用下面的规则得到两个字符串

(?i)(beautiful| ).*(people[a-zA-Z0-9]{2,3})

还有其他方法可以select吗？

我尝试直接使用 (beautiful)?.*(people[a-zA-Z0-9]{2,3}) 但它不起作用。

Answer 1

您在正则表达式中有一个贪婪的点匹配子模式。这个 .* 无法捕获美丽因为它已经与 .* 匹配。可选组 (beautiful)? 或强制组 (beautiful| ) 要么什么都不匹配（#1，组是 "non-participating"），要么不匹配第一个 space (#2)。为避免这种情况，您需要限制匹配 beautiful 到 peopleXXX.

以外的任何内容

一种方法是使用 tempered greedy token:

(?i)(beautiful)?(?:(?!beautiful).)*(people[a-zA-Z0-9]{2,3})
                ^^^^^^^^^^^^^^^^^^^

见demo

(?:(?!beautiful).)* 将匹配除不开始序列 beautiful.

的换行符之外的任何符号

另一种方法是使用这个经过调整的贪婪令牌的展开版本：

(?i)(beautiful)?[^b]*(?:b(?!eautiful)[^b]*)*(people[a-zA-Z0-9]{2,3})

见another demo

[^b]*(?:b(?!eautiful)[^b]*)* 匹配任何不以 beautiful 字符序列开头的文本（它将匹配 b 以外的字符零次或多次 ([^b]*) 然后任意数量的 b 后跟 eautiful (b(?!eautiful)) 并且后跟 b.

以外的零个或多个字符的序列

注意：为了让这个正则表达式更有效一点，你可以 include a check for peopleXXX:

(?i)(beautiful)?(?:(?!beautiful|people[a-zA-Z0-9]).)*(people[a-zA-Z0-9]{2,3})

和一个 unrolled one:

(?i)(beautiful)?[^bp]*(?:p(?!oeple[a-zA-Z0-9])[^p*]|b(?!eautiful)[^b]*)*(people[a-zA-Z0-9]{2,3})

Select 句子中的一些字符串

Select some string from sentence

regex

autoit