使用 WordNet 进行词义消歧。如何将select个词关联到同一个意思?
Word sense disambiguation with WordNet. How to select the words related to the same meaning?
我正在使用 WordNet 和 NLTK 进行词义消歧。我对所有与声音有关的词都很感兴趣。我有一个这样的单词列表,'roll' 就是其中之一。然后我检查我的任何句子是否包含这个词(我也根据 POS 检查它)。如果是的话,我只想 select 这样的句子,这些句子与声音有关。在下面的示例中,它将是第二句话。我现在的想法就是select这样的词,谁的定义里面有个词'sound'是'the sound of a drum (especially a snare drum) beaten rapidly and continuously'。但我怀疑有更优雅的方式。任何想法将不胜感激!
from nltk.wsd import lesk
from nltk.corpus import wordnet as wn
samples = [('The van rolled along the highway.','n'),
('The thunder rolled and the lightning striked.','n')]
word = 'roll'
for sentence, pos_tag in samples:
word_syn = lesk(word_tokenize(sentence.lower()), word, pos_tag)
print 'Sentence:', sentence
print 'Word synset:', word_syn
print 'Corresponding definition:', word_syn.definition()
输出:
Sentence: The van rolled along the highway.
Word synset: Synset('scroll.n.02')
Corresponding definition: a document that can be rolled up (as for storage)
Sentence: The thunder rolled and the lightning striked.
Word synset: Synset('paradiddle.n.01')
Corresponding definition: the sound of a drum (especially a snare drum) beaten rapidly and continuously
您可以使用 WordNet 上位词(具有更一般含义的同义词集)。我的第一个想法是从当前的同义词集向上(使用 synset.hypernyms()
)并继续检查我是否找到 "sound" 同义词集。当我碰到根(没有上位词,即 synset.hypernyms()
returns 一个空列表)时,我会停下来。
现在,对于您的两个示例,这将生成以下同义词集序列:
Sentence:The van rolled along the highway .
Word synset:Synset('scroll.n.02')
[Synset('manuscript.n.02')]
[Synset('autograph.n.01')]
[Synset('writing.n.02')]
[Synset('written_communication.n.01')]
[Synset('communication.n.02')]
[Synset('abstraction.n.06')]
[Synset('entity.n.01')]
Sentence:The thunder rolled and the lightning striked .
Word synset:Synset('paradiddle.n.01')
[Synset('sound.n.04')]
[Synset('happening.n.01')]
[Synset('event.n.01')]
[Synset('psychological_feature.n.01')]
[Synset('abstraction.n.06')]
[Synset('entity.n.01')]
所以您可能想要查找的同义词集之一是 sound.n.04
。但可能还有其他例子,我想你可以尝试其他例子并尝试列出一个列表。
我正在使用 WordNet 和 NLTK 进行词义消歧。我对所有与声音有关的词都很感兴趣。我有一个这样的单词列表,'roll' 就是其中之一。然后我检查我的任何句子是否包含这个词(我也根据 POS 检查它)。如果是的话,我只想 select 这样的句子,这些句子与声音有关。在下面的示例中,它将是第二句话。我现在的想法就是select这样的词,谁的定义里面有个词'sound'是'the sound of a drum (especially a snare drum) beaten rapidly and continuously'。但我怀疑有更优雅的方式。任何想法将不胜感激!
from nltk.wsd import lesk
from nltk.corpus import wordnet as wn
samples = [('The van rolled along the highway.','n'),
('The thunder rolled and the lightning striked.','n')]
word = 'roll'
for sentence, pos_tag in samples:
word_syn = lesk(word_tokenize(sentence.lower()), word, pos_tag)
print 'Sentence:', sentence
print 'Word synset:', word_syn
print 'Corresponding definition:', word_syn.definition()
输出:
Sentence: The van rolled along the highway.
Word synset: Synset('scroll.n.02')
Corresponding definition: a document that can be rolled up (as for storage)
Sentence: The thunder rolled and the lightning striked.
Word synset: Synset('paradiddle.n.01')
Corresponding definition: the sound of a drum (especially a snare drum) beaten rapidly and continuously
您可以使用 WordNet 上位词(具有更一般含义的同义词集)。我的第一个想法是从当前的同义词集向上(使用 synset.hypernyms()
)并继续检查我是否找到 "sound" 同义词集。当我碰到根(没有上位词,即 synset.hypernyms()
returns 一个空列表)时,我会停下来。
现在,对于您的两个示例,这将生成以下同义词集序列:
Sentence:The van rolled along the highway .
Word synset:Synset('scroll.n.02')
[Synset('manuscript.n.02')]
[Synset('autograph.n.01')]
[Synset('writing.n.02')]
[Synset('written_communication.n.01')]
[Synset('communication.n.02')]
[Synset('abstraction.n.06')]
[Synset('entity.n.01')]
Sentence:The thunder rolled and the lightning striked .
Word synset:Synset('paradiddle.n.01')
[Synset('sound.n.04')]
[Synset('happening.n.01')]
[Synset('event.n.01')]
[Synset('psychological_feature.n.01')]
[Synset('abstraction.n.06')]
[Synset('entity.n.01')]
所以您可能想要查找的同义词集之一是 sound.n.04
。但可能还有其他例子,我想你可以尝试其他例子并尝试列出一个列表。