使用 Textblob 从文本中删除所有名词短语

Question

我需要从文本中删除所有专有名词。结果是数据框。我正在使用文本 blob。下面是代码。

from textblob import TextBlob

          strings = []
            for col in result:
                for i in range(result.shape[0]):
                    text = result[col][i]
                    Txtblob = TextBlob(text)

                    for word, pos in Txtblob.noun_phrases:
                        print (word, pos)
                        if tag != 'NNP'
                           print(' '.join(edited_sentence))

它只识别一个NNP

Answer 1

要从以下文本（来自 documenation）中删除所有带有 'NNP' 标记的词，您可以执行以下操作：

from textblob import TextBlob

# Sample text
text = '''
The titular threat of The Blob has always struck me as the ultimate movie
monster: an insatiably hungry, amoeba-like mass able to penetrate
virtually any safeguard, capable of--as a doomed doctor chillingly
describes it--"assimilating flesh on contact.'''

text = TextBlob(text)

# Create a list of words that are tagged with 'NNP'
# In this case it will only be 'Blob'
words_to_remove = [word[0] for word in [tag for tag in text.tags if tag[1] == 'NNP']]

# Remove the Words from the sentence, using words_to_remove
edited_sentence = ' '.join([word for word in text.split(' ') if word not in words_to_remove])

# Show the result
print(edited_sentence)

出

# Notice the lack of the word 'Blob'
'\nThe titular threat of The has always struck me as the ultimate
 movie\nmonster: an insatiably hungry, amoeba-like mass able to 
 penetrate\nvirtually any safeguard, capable of--as a doomed doctor 
 chillingly\ndescribes it--"assimilating flesh on contact.\nSnide 
 comparisons to gelatin be damned, it\'s a concept with the 
 most\ndevastating of potential consequences, not unlike the grey goo 
 scenario\nproposed by technological theorists fearful of\nartificial 
 intelligence run rampant.\n'

对您的示例的评论

from textblob import TextBlob

strings = [] # This variable is not used anywhere
for col in result:
    for i in range(result.shape[0]):
        text = result[col][i]
        txt_blob = TextBlob(text)

        # txt_blob.noun_phrases will return a list of noun_phrases,
        # To get the position of each list you need use the function 'enuermate', like this
        for word, pos in enumerate(txt_blob.noun_phrases):

            # Now you can print the word and position
            print (word, pos)
            # This will give you something like the following:
            # 0 titular threat
            # 1 blob
            # 2 ultimate movie monster

            # This following line does not make any sense, because tag has not yet been assigned
            # and you are not iterating over the words from the previous step
            if tag != 'NNP'
                # You are not assigning anything to edited_sentence, so this would not work either.
                print(' '.join(edited_sentence))

带有新代码的示例

from textblob import TextBlob

for col in result:
    for i in range(result.shape[0]):
        text = result[col][i]
        txt_blob = TextBlob(text)

        # Create a list of words that are tagged with 'NNP'
        # In this case it will only be 'Blob'
        words_to_remove = [word[0] for word in [tag for tag in txt_blob.tags if tag[1] == 'NNP']]

        # Remove the Words from the sentence, using words_to_remove
        edited_sentence = ' '.join([word for word in text.split(' ') if word not in words_to_remove])

        # Show the result
        print(edited_sentence)

使用 Textblob 从文本中删除所有名词短语

Deleting all the noun phrases from text using Textblob

python

textblob

对您的示例的评论

带有新代码的示例