如何在使用 python 在文本中找到关键字后提取几个前词

Question

我有一个关键字"grand master"，我正在大文本中搜索关键字。我需要提取关键词的5个前词和5个后词（根据它可能去next/before句的位置），并且这个关键词在大文本中多次出现。

作为线索，首先我尝试使用text.find()找到关键字在文本中的位置，并在4个不同的位置找到了关键字

>>positions
>>[125, 567,34445, 98885445]

所以尝试根据空格拆分文本并取前 5 个单词，

text[positions[i]:].split([len(keyword.split()):len(keyword.split())+5]

但是如何提取该关键字之前的 5 个词？

Answer 1

你可以简单地使用

text[:position[i]].split()[-5:]

Answer 2

为此使用 re 模块。对于第一个关键字匹配：

pattern = "(.+) (.+) (.+) (.+) (.+) grand master (.+) (.+) (.+) (.+) (.+)"
match = re.search(pattern, text)
if match:
    firstword_before = match.group(1) # first pair of parentheses
    lastword_before = match.group(5)

    firstword_after = match.group(6)
    lastword_after = match.group(10)

模式中的括号表示组号。第一对括号对应 match.group(1)，第二对括号对应 match.group(2) 等等。如果你想要所有的组，你可以使用：

match.groups() # returns tuple of groups

或

match.group(0) # returns string of groups

对于文本中的所有关键字匹配，使用re.findall。阅读 re 了解详情。

P.S：有更好的方式来写模式。那是我懒

如何在使用 python 在文本中找到关键字后提取几个前词

how to extract few before words after finding a keyword in text using python

python

nlp

n-gram