将 WordNet 与 nltk 结合使用来查找有意义的同义词

Using WordNet with nltk to find synonyms that make sense

我想输入一个句子,输出一个硬词更简单的句子。

我正在使用 Nltk 来标记句子和标记词,但我在使用 WordNet 为我想要的词的特定含义找到同义词时遇到了问题。

例如:

输入: “我拒绝拿起拒绝

也许 refuse #1 是最简单的拒绝词,但 refuse #2 表示垃圾,还有更简单的词可以去那里。

Nltk 可能能够将 refuse #2 标记为名词,但是我如何从 WordNet 中获取 refuse(垃圾)的同义词?

听起来您想要基于单词词性(即名词、动词等)的单词同义词

Follows 根据词性为句子中的每个单词创建同义词。 参考文献:

  1. Extract Word from Synset using Wordnet in NLTK 3.0
  2. Printing the part of speech along with the synonyms of the word

代码

import nltk; nltk.download('popular') 
from nltk.corpus import wordnet as wn

def get_synonyms(word, pos):
  ' Gets word synonyms for part of speech '
  for synset in wn.synsets(word, pos=pos_to_wordnet_pos(pos)):
    for lemma in synset.lemmas():
        yield lemma.name()

def pos_to_wordnet_pos(penntag, returnNone=False):
   ' Mapping from POS tag word wordnet pos tag '
    morphy_tag = {'NN':wn.NOUN, 'JJ':wn.ADJ,
                  'VB':wn.VERB, 'RB':wn.ADV}
    try:
        return morphy_tag[penntag[:2]]
    except:
        return None if returnNone else ''

示例用法

# Tokenize text
text = nltk.word_tokenize("I refuse to pick up the refuse")

for word, tag in nltk.pos_tag(text):
  print(f'word is {word}, POS is {tag}')

  # Filter for unique synonyms not equal to word and sort.
  unique = sorted(set(synonym for synonym in get_synonyms(word, tag) if synonym != word))

  for synonym in unique:
    print('\t', synonym)

输出

注意 refuse 的不同同义词集基于 POS。

word is I, POS is PRP
word is refuse, POS is VBP
     decline
     defy
     deny
     pass_up
     reject
     resist
     turn_away
     turn_down
word is to, POS is TO
word is pick, POS is VB
     beak
     blame
     break_up
     clean
     cull
     find_fault
     foot
     nibble
     peck
     piece
     pluck
     plunk
word is up, POS is RP
word is the, POS is DT
word is refuse, POS is NN
     food_waste
     garbage
     scraps