正则表达式匹配两个字符串，字符串之间有给定的单词数

Question

我想匹配两个字符串之间的字符。（对于给定数量的单词，它们之间可以接受）

例如：

text = 'I want apples and oranges'
参数是 'apples'、'oranges' 和 k=2，这是这些字符串单词之间允许的最大单词数。我期望输出为 'apples and oranges' 因为两个给定字符串之间只有一个词

这与正则表达式中的 (?<=...) 模式非常相似，但我无法定义中间可接受的单词数量，我希望提取相关文本而不是仅提取中间的内容

我现在拥有的：

import re
text = 'I want apples and oranges'
pattern = "(?<=apples)(.*)(?=oranges)"
m = re.search(pattern, text)
print(m)

<re.Match object; span=(13, 18), match=' and '>

这输出 ' and '。但我想获得 apples and oranges 的输出，而不仅仅是两者之间的输出。我希望能够限制 apples 和 oranges 之间可接受的单词数量。例如，如果我定义 k = 2 并且句子是“I want apples along with some oranges”这不应该匹配，因为 apples 和 oranges 之间有 3 个词。

有谁知道我是否也可以使用正则表达式来做到这一点？

Answer 1

你可以使用像

这样的东西

import re
text = 'I want apples and oranges'
k = 2
pattern = f"apples(?:\s+\w+){{0,{k}}}\s+oranges"
m = re.search(pattern, text)
if m:
    print(m.group())

# => apples and oranges

这里，我用了\w+来匹配一个词。如果单词是非空白块，则需要使用

pattern = f"apples(?:\s+\S+){{0,{k}}}\s+oranges"

参见this Python demo。

如果需要加分界线，需要研究一下 and 的帖子。对于当前示例，fr"\bapples(?:\s+\w+){{0,{k}}}\s+oranges\b" 将起作用。

模式看起来像 apples(?:\s+\w+){0,k}\s+oranges 并且匹配

apples - apples 字符串
(?:\s+\w+){0,k} - 零到 k 一个或多个空格和一个或多个单词字符的重复
\s+ - 一个或多个空格
oranges 一个 oranges 字符串。

正则表达式匹配两个字符串，字符串之间有给定的单词数

Regex match two strings with given number of words in between strings

python

regex

regex-lookarounds