将 WordNet 与 nltk 结合使用来查找有意义的同义词
Using WordNet with nltk to find synonyms that make sense
我想输入一个句子,输出一个硬词更简单的句子。
我正在使用 Nltk 来标记句子和标记词,但我在使用 WordNet 为我想要的词的特定含义找到同义词时遇到了问题。
例如:
输入:
“我拒绝拿起拒绝”
也许 refuse #1 是最简单的拒绝词,但 refuse #2 表示垃圾,还有更简单的词可以去那里。
Nltk 可能能够将 refuse #2 标记为名词,但是我如何从 WordNet 中获取 refuse(垃圾)的同义词?
听起来您想要基于单词词性(即名词、动词等)的单词同义词
Follows 根据词性为句子中的每个单词创建同义词。
参考文献:
- Extract Word from Synset using Wordnet in NLTK 3.0
- Printing the part of speech along with the synonyms of the word
代码
import nltk; nltk.download('popular')
from nltk.corpus import wordnet as wn
def get_synonyms(word, pos):
' Gets word synonyms for part of speech '
for synset in wn.synsets(word, pos=pos_to_wordnet_pos(pos)):
for lemma in synset.lemmas():
yield lemma.name()
def pos_to_wordnet_pos(penntag, returnNone=False):
' Mapping from POS tag word wordnet pos tag '
morphy_tag = {'NN':wn.NOUN, 'JJ':wn.ADJ,
'VB':wn.VERB, 'RB':wn.ADV}
try:
return morphy_tag[penntag[:2]]
except:
return None if returnNone else ''
示例用法
# Tokenize text
text = nltk.word_tokenize("I refuse to pick up the refuse")
for word, tag in nltk.pos_tag(text):
print(f'word is {word}, POS is {tag}')
# Filter for unique synonyms not equal to word and sort.
unique = sorted(set(synonym for synonym in get_synonyms(word, tag) if synonym != word))
for synonym in unique:
print('\t', synonym)
输出
注意 refuse 的不同同义词集基于 POS。
word is I, POS is PRP
word is refuse, POS is VBP
decline
defy
deny
pass_up
reject
resist
turn_away
turn_down
word is to, POS is TO
word is pick, POS is VB
beak
blame
break_up
clean
cull
find_fault
foot
nibble
peck
piece
pluck
plunk
word is up, POS is RP
word is the, POS is DT
word is refuse, POS is NN
food_waste
garbage
scraps
我想输入一个句子,输出一个硬词更简单的句子。
我正在使用 Nltk 来标记句子和标记词,但我在使用 WordNet 为我想要的词的特定含义找到同义词时遇到了问题。
例如:
输入: “我拒绝拿起拒绝”
也许 refuse #1 是最简单的拒绝词,但 refuse #2 表示垃圾,还有更简单的词可以去那里。
Nltk 可能能够将 refuse #2 标记为名词,但是我如何从 WordNet 中获取 refuse(垃圾)的同义词?
听起来您想要基于单词词性(即名词、动词等)的单词同义词
Follows 根据词性为句子中的每个单词创建同义词。 参考文献:
- Extract Word from Synset using Wordnet in NLTK 3.0
- Printing the part of speech along with the synonyms of the word
代码
import nltk; nltk.download('popular')
from nltk.corpus import wordnet as wn
def get_synonyms(word, pos):
' Gets word synonyms for part of speech '
for synset in wn.synsets(word, pos=pos_to_wordnet_pos(pos)):
for lemma in synset.lemmas():
yield lemma.name()
def pos_to_wordnet_pos(penntag, returnNone=False):
' Mapping from POS tag word wordnet pos tag '
morphy_tag = {'NN':wn.NOUN, 'JJ':wn.ADJ,
'VB':wn.VERB, 'RB':wn.ADV}
try:
return morphy_tag[penntag[:2]]
except:
return None if returnNone else ''
示例用法
# Tokenize text
text = nltk.word_tokenize("I refuse to pick up the refuse")
for word, tag in nltk.pos_tag(text):
print(f'word is {word}, POS is {tag}')
# Filter for unique synonyms not equal to word and sort.
unique = sorted(set(synonym for synonym in get_synonyms(word, tag) if synonym != word))
for synonym in unique:
print('\t', synonym)
输出
注意 refuse 的不同同义词集基于 POS。
word is I, POS is PRP
word is refuse, POS is VBP
decline
defy
deny
pass_up
reject
resist
turn_away
turn_down
word is to, POS is TO
word is pick, POS is VB
beak
blame
break_up
clean
cull
find_fault
foot
nibble
peck
piece
pluck
plunk
word is up, POS is RP
word is the, POS is DT
word is refuse, POS is NN
food_waste
garbage
scraps