使用 Textblob 从文本中删除所有名词短语
Deleting all the noun phrases from text using Textblob
我需要从文本中删除所有专有名词。
结果是数据框。
我正在使用文本 blob。下面是代码。
from textblob import TextBlob
strings = []
for col in result:
for i in range(result.shape[0]):
text = result[col][i]
Txtblob = TextBlob(text)
for word, pos in Txtblob.noun_phrases:
print (word, pos)
if tag != 'NNP'
print(' '.join(edited_sentence))
它只识别一个NNP
要从以下文本(来自 documenation)中删除所有带有 'NNP' 标记的词,您可以执行以下操作:
from textblob import TextBlob
# Sample text
text = '''
The titular threat of The Blob has always struck me as the ultimate movie
monster: an insatiably hungry, amoeba-like mass able to penetrate
virtually any safeguard, capable of--as a doomed doctor chillingly
describes it--"assimilating flesh on contact.'''
text = TextBlob(text)
# Create a list of words that are tagged with 'NNP'
# In this case it will only be 'Blob'
words_to_remove = [word[0] for word in [tag for tag in text.tags if tag[1] == 'NNP']]
# Remove the Words from the sentence, using words_to_remove
edited_sentence = ' '.join([word for word in text.split(' ') if word not in words_to_remove])
# Show the result
print(edited_sentence)
出
# Notice the lack of the word 'Blob'
'\nThe titular threat of The has always struck me as the ultimate
movie\nmonster: an insatiably hungry, amoeba-like mass able to
penetrate\nvirtually any safeguard, capable of--as a doomed doctor
chillingly\ndescribes it--"assimilating flesh on contact.\nSnide
comparisons to gelatin be damned, it\'s a concept with the
most\ndevastating of potential consequences, not unlike the grey goo
scenario\nproposed by technological theorists fearful of\nartificial
intelligence run rampant.\n'
对您的示例的评论
from textblob import TextBlob
strings = [] # This variable is not used anywhere
for col in result:
for i in range(result.shape[0]):
text = result[col][i]
txt_blob = TextBlob(text)
# txt_blob.noun_phrases will return a list of noun_phrases,
# To get the position of each list you need use the function 'enuermate', like this
for word, pos in enumerate(txt_blob.noun_phrases):
# Now you can print the word and position
print (word, pos)
# This will give you something like the following:
# 0 titular threat
# 1 blob
# 2 ultimate movie monster
# This following line does not make any sense, because tag has not yet been assigned
# and you are not iterating over the words from the previous step
if tag != 'NNP'
# You are not assigning anything to edited_sentence, so this would not work either.
print(' '.join(edited_sentence))
带有新代码的示例
from textblob import TextBlob
for col in result:
for i in range(result.shape[0]):
text = result[col][i]
txt_blob = TextBlob(text)
# Create a list of words that are tagged with 'NNP'
# In this case it will only be 'Blob'
words_to_remove = [word[0] for word in [tag for tag in txt_blob.tags if tag[1] == 'NNP']]
# Remove the Words from the sentence, using words_to_remove
edited_sentence = ' '.join([word for word in text.split(' ') if word not in words_to_remove])
# Show the result
print(edited_sentence)
我需要从文本中删除所有专有名词。 结果是数据框。 我正在使用文本 blob。下面是代码。
from textblob import TextBlob
strings = []
for col in result:
for i in range(result.shape[0]):
text = result[col][i]
Txtblob = TextBlob(text)
for word, pos in Txtblob.noun_phrases:
print (word, pos)
if tag != 'NNP'
print(' '.join(edited_sentence))
它只识别一个NNP
要从以下文本(来自 documenation)中删除所有带有 'NNP' 标记的词,您可以执行以下操作:
from textblob import TextBlob
# Sample text
text = '''
The titular threat of The Blob has always struck me as the ultimate movie
monster: an insatiably hungry, amoeba-like mass able to penetrate
virtually any safeguard, capable of--as a doomed doctor chillingly
describes it--"assimilating flesh on contact.'''
text = TextBlob(text)
# Create a list of words that are tagged with 'NNP'
# In this case it will only be 'Blob'
words_to_remove = [word[0] for word in [tag for tag in text.tags if tag[1] == 'NNP']]
# Remove the Words from the sentence, using words_to_remove
edited_sentence = ' '.join([word for word in text.split(' ') if word not in words_to_remove])
# Show the result
print(edited_sentence)
出
# Notice the lack of the word 'Blob'
'\nThe titular threat of The has always struck me as the ultimate
movie\nmonster: an insatiably hungry, amoeba-like mass able to
penetrate\nvirtually any safeguard, capable of--as a doomed doctor
chillingly\ndescribes it--"assimilating flesh on contact.\nSnide
comparisons to gelatin be damned, it\'s a concept with the
most\ndevastating of potential consequences, not unlike the grey goo
scenario\nproposed by technological theorists fearful of\nartificial
intelligence run rampant.\n'
对您的示例的评论
from textblob import TextBlob
strings = [] # This variable is not used anywhere
for col in result:
for i in range(result.shape[0]):
text = result[col][i]
txt_blob = TextBlob(text)
# txt_blob.noun_phrases will return a list of noun_phrases,
# To get the position of each list you need use the function 'enuermate', like this
for word, pos in enumerate(txt_blob.noun_phrases):
# Now you can print the word and position
print (word, pos)
# This will give you something like the following:
# 0 titular threat
# 1 blob
# 2 ultimate movie monster
# This following line does not make any sense, because tag has not yet been assigned
# and you are not iterating over the words from the previous step
if tag != 'NNP'
# You are not assigning anything to edited_sentence, so this would not work either.
print(' '.join(edited_sentence))
带有新代码的示例
from textblob import TextBlob
for col in result:
for i in range(result.shape[0]):
text = result[col][i]
txt_blob = TextBlob(text)
# Create a list of words that are tagged with 'NNP'
# In this case it will only be 'Blob'
words_to_remove = [word[0] for word in [tag for tag in txt_blob.tags if tag[1] == 'NNP']]
# Remove the Words from the sentence, using words_to_remove
edited_sentence = ' '.join([word for word in text.split(' ') if word not in words_to_remove])
# Show the result
print(edited_sentence)