提取 words/phrase 后跟一个短语

Question

我有一个包含短语列表的文本文件。以下是文件的外观：

文件名：KP.txt

从下面的输入（段落）中，我想提取 KP.txt 短语之后的下两个单词（这些短语可以是我上面 KP.txt 文件中显示的任何内容）。我只需要提取接下来的 2 个单词。

输入：

This is Lee. Thanks for contacting me. I wanted to know the exchange policy at Noriaqer hardware services.

在上面的例子中，我发现短语 " I wanted to know" 与 KP.txt 文件内容匹配。所以如果我想在此之后提取接下来的 2 个单词，我的输出将像 "exchange policy".

如何在 python 中提取它？

Answer 1

你可以使用这个：

with open("KP.txt") as fobj:
    phrases = list(map(lambda sentence : sentence.lower().strip(), fobj.readlines()))

paragraph = input("Enter The Whole Paragraph in one line:\t").lower()

for phrase in phrases:
    if phrase in paragraph:
        temp = paragraph.split(phrase)[1:]
        for clause in temp:
            print(" ".join(clause.split()[:2]))

Answer 2

我认为自然语言处理可能是更好的解决方案，但这段代码会有所帮助:)

def search_in_text(kp,text):
    for line in kp:
        #if a search phrase found in kp lines
        if line in text:
            #the starting index of the two words
            i1=text.find(line)+len(line)
            #the end index of the following two words (first index+50 at maximum)
            i2=(i1+50) if len(text)>(i1+50) else len(text)
            #split the following text to words (next_words) and remove empty spaces
            next_words=[word for word in text[i1:i2].split(' ') if word!='']
            #return  only the next two words from (next_words)
            return next_words[0:2]        
    return [] # return empty list if no phrase matching

#read your kp file as list of lines
kp=open("kp.txt").read().split("\n")

#input 1 
text = 'This is Lee. Thanks for contacting me. I wanted to know exchange policy at Noriaqer hardware services.'
print('input ->>',text)
output = search_in_text(kp,text)
print('output ->>',output)

input ->> This is Lee. Thanks for contacting me. I wanted to know exchange policy at Noriaqer hardware services.
output ->> ['exchange', 'policy']

#input 2
text = 'Boss was very angry and said: I wish to know why you are late?'
print('input ->>',text)
output = search_in_text(kp,text)
print('output ->>',output)

input ->> Boss was very angry and said: I wish to know why you are late?
output ->> ['why', 'you']

Answer 3

假设您已经知道如何将输入文件读入列表，可以在正则表达式的帮助下完成。

>>> wordlist = ['I would like to understand', 'I wanted to know', 'I wish to know', 'I am interested to know']
>>> input_text = 'This is Lee. Thanks for contacting me. I wanted to know exchange policy at Noriaqer hardware services.'
>>> def word_extraction (input_text, wordlist):
...     for word in wordlist:
...         if word in input_text:
...             output = re.search (r'(?<=%s)(.\w*){2}' % word, input_text)
...             print (output.group ().lstrip ())
>>> word_extraction(input_text, wordlist)
exchange policy
>>> input_text = 'This is Lee. Thanks for contacting me. I wish to know where is Noriaqer hardware.'
>>> word_extraction(input_text, wordlist)
where is
>>> input_text = 'This is Lee. Thanks for contacting me. I\'d like to know where is Noriaqer hardware.'
>>> word_extraction(input_text, wordlist)

>>>

首先我们需要检查我们想要的短语是否在句子中。如果您的列表很大，这不是最有效的方法，但现在可以使用。
接下来如果它在我们的短语“词典”中，我们使用正则表达式来提取我们想要的关键字。
最后去掉目标词前面的前导白色space。

正则表达式提示：

(?<=%s) 是后视断言。意思是检查以“我想知道”开头的句子后面的单词
(.\w*){2} 表示我们的短语后跟一个或多个单词的任何字符，停在关键短语后的 2 个单词。

提取 words/phrase 后跟一个短语

Extracting words/phrase followed by a phrase

python

extract

phrase