提取 words/phrase 后跟一个短语
Extracting words/phrase followed by a phrase
我有一个包含短语列表的文本文件。以下是文件的外观:
文件名:KP.txt
从下面的输入(段落)中,我想提取 KP.txt
短语之后的下两个单词(这些短语可以是我上面 KP.txt
文件中显示的任何内容)。我只需要提取接下来的 2 个单词。
输入:
This is Lee. Thanks for contacting me. I wanted to know the exchange policy at Noriaqer hardware services.
在上面的例子中,我发现短语 " I wanted to know"
与 KP.txt
文件内容匹配。所以如果我想在此之后提取接下来的 2 个单词,我的输出将像 "exchange policy"
.
如何在 python 中提取它?
你可以使用这个:
with open("KP.txt") as fobj:
phrases = list(map(lambda sentence : sentence.lower().strip(), fobj.readlines()))
paragraph = input("Enter The Whole Paragraph in one line:\t").lower()
for phrase in phrases:
if phrase in paragraph:
temp = paragraph.split(phrase)[1:]
for clause in temp:
print(" ".join(clause.split()[:2]))
我认为自然语言处理可能是更好的解决方案,但这段代码会有所帮助:)
def search_in_text(kp,text):
for line in kp:
#if a search phrase found in kp lines
if line in text:
#the starting index of the two words
i1=text.find(line)+len(line)
#the end index of the following two words (first index+50 at maximum)
i2=(i1+50) if len(text)>(i1+50) else len(text)
#split the following text to words (next_words) and remove empty spaces
next_words=[word for word in text[i1:i2].split(' ') if word!='']
#return only the next two words from (next_words)
return next_words[0:2]
return [] # return empty list if no phrase matching
#read your kp file as list of lines
kp=open("kp.txt").read().split("\n")
#input 1
text = 'This is Lee. Thanks for contacting me. I wanted to know exchange policy at Noriaqer hardware services.'
print('input ->>',text)
output = search_in_text(kp,text)
print('output ->>',output)
input ->> This is Lee. Thanks for contacting me. I wanted to know exchange policy at Noriaqer hardware services.
output ->> ['exchange', 'policy']
#input 2
text = 'Boss was very angry and said: I wish to know why you are late?'
print('input ->>',text)
output = search_in_text(kp,text)
print('output ->>',output)
input ->> Boss was very angry and said: I wish to know why you are late?
output ->> ['why', 'you']
假设您已经知道如何将输入文件读入列表,可以在正则表达式的帮助下完成。
>>> wordlist = ['I would like to understand', 'I wanted to know', 'I wish to know', 'I am interested to know']
>>> input_text = 'This is Lee. Thanks for contacting me. I wanted to know exchange policy at Noriaqer hardware services.'
>>> def word_extraction (input_text, wordlist):
... for word in wordlist:
... if word in input_text:
... output = re.search (r'(?<=%s)(.\w*){2}' % word, input_text)
... print (output.group ().lstrip ())
>>> word_extraction(input_text, wordlist)
exchange policy
>>> input_text = 'This is Lee. Thanks for contacting me. I wish to know where is Noriaqer hardware.'
>>> word_extraction(input_text, wordlist)
where is
>>> input_text = 'This is Lee. Thanks for contacting me. I\'d like to know where is Noriaqer hardware.'
>>> word_extraction(input_text, wordlist)
>>>
- 首先我们需要检查我们想要的短语是否在句子中。如果您的列表很大,这不是最有效的方法,但现在可以使用。
- 接下来如果它在我们的短语“词典”中,我们使用正则表达式来提取我们想要的关键字。
- 最后去掉目标词前面的前导白色space。
正则表达式提示:
- (?<=%s) 是后视断言。意思是检查以“我想知道”开头的句子后面的单词
- (.\w*){2} 表示我们的短语后跟一个或多个单词的任何字符,停在关键短语后的 2 个单词。
我有一个包含短语列表的文本文件。以下是文件的外观:
文件名:KP.txt
从下面的输入(段落)中,我想提取 KP.txt
短语之后的下两个单词(这些短语可以是我上面 KP.txt
文件中显示的任何内容)。我只需要提取接下来的 2 个单词。
输入:
This is Lee. Thanks for contacting me. I wanted to know the exchange policy at Noriaqer hardware services.
在上面的例子中,我发现短语 " I wanted to know"
与 KP.txt
文件内容匹配。所以如果我想在此之后提取接下来的 2 个单词,我的输出将像 "exchange policy"
.
如何在 python 中提取它?
你可以使用这个:
with open("KP.txt") as fobj:
phrases = list(map(lambda sentence : sentence.lower().strip(), fobj.readlines()))
paragraph = input("Enter The Whole Paragraph in one line:\t").lower()
for phrase in phrases:
if phrase in paragraph:
temp = paragraph.split(phrase)[1:]
for clause in temp:
print(" ".join(clause.split()[:2]))
我认为自然语言处理可能是更好的解决方案,但这段代码会有所帮助:)
def search_in_text(kp,text):
for line in kp:
#if a search phrase found in kp lines
if line in text:
#the starting index of the two words
i1=text.find(line)+len(line)
#the end index of the following two words (first index+50 at maximum)
i2=(i1+50) if len(text)>(i1+50) else len(text)
#split the following text to words (next_words) and remove empty spaces
next_words=[word for word in text[i1:i2].split(' ') if word!='']
#return only the next two words from (next_words)
return next_words[0:2]
return [] # return empty list if no phrase matching
#read your kp file as list of lines
kp=open("kp.txt").read().split("\n")
#input 1
text = 'This is Lee. Thanks for contacting me. I wanted to know exchange policy at Noriaqer hardware services.'
print('input ->>',text)
output = search_in_text(kp,text)
print('output ->>',output)
input ->> This is Lee. Thanks for contacting me. I wanted to know exchange policy at Noriaqer hardware services.
output ->> ['exchange', 'policy']
#input 2
text = 'Boss was very angry and said: I wish to know why you are late?'
print('input ->>',text)
output = search_in_text(kp,text)
print('output ->>',output)
input ->> Boss was very angry and said: I wish to know why you are late?
output ->> ['why', 'you']
假设您已经知道如何将输入文件读入列表,可以在正则表达式的帮助下完成。
>>> wordlist = ['I would like to understand', 'I wanted to know', 'I wish to know', 'I am interested to know']
>>> input_text = 'This is Lee. Thanks for contacting me. I wanted to know exchange policy at Noriaqer hardware services.'
>>> def word_extraction (input_text, wordlist):
... for word in wordlist:
... if word in input_text:
... output = re.search (r'(?<=%s)(.\w*){2}' % word, input_text)
... print (output.group ().lstrip ())
>>> word_extraction(input_text, wordlist)
exchange policy
>>> input_text = 'This is Lee. Thanks for contacting me. I wish to know where is Noriaqer hardware.'
>>> word_extraction(input_text, wordlist)
where is
>>> input_text = 'This is Lee. Thanks for contacting me. I\'d like to know where is Noriaqer hardware.'
>>> word_extraction(input_text, wordlist)
>>>
- 首先我们需要检查我们想要的短语是否在句子中。如果您的列表很大,这不是最有效的方法,但现在可以使用。
- 接下来如果它在我们的短语“词典”中,我们使用正则表达式来提取我们想要的关键字。
- 最后去掉目标词前面的前导白色space。
正则表达式提示:
- (?<=%s) 是后视断言。意思是检查以“我想知道”开头的句子后面的单词
- (.\w*){2} 表示我们的短语后跟一个或多个单词的任何字符,停在关键短语后的 2 个单词。