如何通过 python3 re 提取文档中两个单词之间的文本？

Question

我想使用以下代码提取 Love 和 OK 之间的文本，但它不起作用。

document = "This is a document with random words Love apples ornages pears OK some thing Love jeep plane car OK any more Love water cola coffee OK bra bra."

x = re.search("^Love.*OK$", document)

我想获取以下文本：apples ornages pears jeep plane car water cola coffee

Answer 1

我们可以尝试使用您当前的正则表达式模式（稍作修改）或 re.findall 来查找所有子字符串匹配项。然后，将生成的数组作为单个字符串连接在一起。

document = "This is a document with random words Love apples oranges pears OK some thing Love jeep plane car OK any more Love water cola coffee OK bra bra."
matches = re.findall(r'\bLove (.*?) OK\b', document)
print(' '.join(matches))

这会打印：

apples oranges pears jeep plane car water cola coffee

解释：

正则表达式模式 \bLove (.*?) OK\b 将捕获每组 Love ... OK 标记之间的内容。在这种情况下，这会生成三个子字符串。然后我们使用 join().

将 re.findall 的输出数组连接成一个字符串

Answer 2

符号^表示字符串的开头，$表示字符串的结尾，所以这里不适用。

只需删除它们即可，使用 .*? 匹配最小的字符串

import re

document = "This is a document with random words Love apples ornages pears OK some thing Love jeep plane car OK any more Love water cola coffee OK bra bra."

x = re.search("Love.*?OK", document).group()
print(x)  # Love apples ornages pears OK

x = re.search("Love.*OK", document).group()
print(x)  # Love apples ornages pears OK some thing Love jeep plane car OK any more Love water cola coffee OK

Answer 3

你可以这样做。

使用 re.findall() 获取所有匹配模式的列表。

Love\s(.*?)\sOK - This pattern matches anything that is present in between the word Love and OK.

import re
s = "this is a document with random words Love apples ornages pears OK Love jeep plane car OK Love water cola coffee OK bra bra."
d = re.findall('Love\s(.*?)\sOK', s)

print(d)
print(' '.join(d))

['apples ornages pears', 'jeep plane car', 'water cola coffee']

apples ornages pears jeep plane car water cola coffee

如何通过 python3 re 提取文档中两个单词之间的文本？

How to extract text between two words in a document by python3 re?

python

python-re