如何通过 python3 re 提取文档中两个单词之间的文本?

How to extract text between two words in a document by python3 re?

我想使用以下代码提取 LoveOK 之间的文本,但它不起作用。

document = "This is a document with random words Love apples ornages pears OK some thing Love jeep plane car OK any more Love water cola coffee OK bra bra."

x = re.search("^Love.*OK$", document)

我想获取以下文本:apples ornages pears jeep plane car water cola coffee

我们可以尝试使用您当前的正则表达式模式(稍作修改)或 re.findall 来查找所有子字符串匹配项。然后,将生成的数组作为单个字符串连接在一起。

document = "This is a document with random words Love apples oranges pears OK some thing Love jeep plane car OK any more Love water cola coffee OK bra bra."
matches = re.findall(r'\bLove (.*?) OK\b', document)
print(' '.join(matches))

这会打印:

apples oranges pears jeep plane car water cola coffee

解释:

正则表达式模式 \bLove (.*?) OK\b 将捕获每组 Love ... OK 标记之间的内容。在这种情况下,这会生成三个子字符串。然后我们使用 join().

re.findall 的输出数组连接成一个字符串

符号^表示字符串的开头,$表示字符串的结尾,所以这里不适用。


只需删除它们即可,使用 .*? 匹配最小的字符串

import re

document = "This is a document with random words Love apples ornages pears OK some thing Love jeep plane car OK any more Love water cola coffee OK bra bra."

x = re.search("Love.*?OK", document).group()
print(x)  # Love apples ornages pears OK

x = re.search("Love.*OK", document).group()
print(x)  # Love apples ornages pears OK some thing Love jeep plane car OK any more Love water cola coffee OK

你可以这样做。

使用 re.findall() 获取所有匹配模式的列表。

Love\s(.*?)\sOK - This pattern matches anything that is present in between the word Love and OK.

import re
s = "this is a document with random words Love apples ornages pears OK Love jeep plane car OK Love water cola coffee OK bra bra."
d = re.findall('Love\s(.*?)\sOK', s)

print(d)
print(' '.join(d))
['apples ornages pears', 'jeep plane car', 'water cola coffee']

apples ornages pears jeep plane car water cola coffee