如何在 NLTK 同义词集中打印所有 lemma_names 单词而不重复其同义词和 pos_tag 多次?
How to print all lemma_names of word without repeating its synonyms and pos_tag more than once in NLTK synsets?
我正在尝试查找单词的同义词集。这是我的代码:
from nltk.corpus import wordnet as wn
from nltk import pos_tag
def getSynonyms(word1):
synonymList1 = []
for data1 in word1:
wordnetSynset1 = wn.synsets(data1)
tempList1=[]
for synset1 in wordnetSynset1:
synLemmas = synset1.lemma_names()
for i in xrange(len(synLemmas)):
word = synLemmas[i].replace('_',' ')
tempList1.append(pos_tag(word.split()))
synonymList1.append(tempList1)
return synonymList1
word1 = ['study']
syn1 = getSynonyms(word1)
print syn1
这是输出:
[[[(u'survey', 'NN')], [(u'study', 'NN')], [(u'study', 'NN')], [(u'work', 'NN')], [(u'report', 'NN')], [(u'study', 'NN')], [(u'written', 'VBN'), (u'report', 'NN')], [(u'study', 'NN')], [(u'study', 'NN')], [(u'discipline', 'NN')], [(u'subject', 'NN')], [(u'subject', 'JJ'), (u'area', 'NN')], [(u'subject', 'JJ'), (u'field', 'NN')], [(u'field', 'NN')], [(u'field', 'NN'), (u'of', 'IN'), (u'study', 'NN')], [(u'study', 'NN')], [(u'bailiwick', 'NN')], [(u'sketch', 'NN')], [(u'study', 'NN')], [(u'cogitation', 'NN')], [(u'study', 'NN')], [(u'study', 'NN')], [(u'study', 'NN')], [(u'analyze', 'NN')], [(u'analyse', 'NN')], [(u'study', 'NN')], [(u'examine', 'NN')], [(u'canvass', 'NN')], [(u'canvas', 'NN')], [(u'study', 'NN')], [(u'study', 'NN')], [(u'consider', 'VB')], [(u'learn', 'NN')], [(u'study', 'NN')], [(u'read', 'NN')], [(u'take', 'VB')], [(u'study', 'NN')], [(u'hit', 'VB'), (u'the', 'DT'), (u'books', 'NNS')], [(u'study', 'NN')], [(u'meditate', 'NN')], [(u'contemplate', 'NN')]]]
正如我们所见,'study','NN'
出现了不止一次
如何在不重复的情况下为每个同义词只打印一次?
所以每个同义词只用一个同义词表示
在行 tempList1.append(pos_tag(word.split()))
中,而不是始终附加到 for 循环内的列表。您应该检查您尝试添加的元素是否已经存在于列表中。有一个简单的 if 语句检查就可以了。
if pos_tag(word.split()) not in tempList1:
tempList1.append(pos_tag(word.split()))
这是一个不会被添加两次的元素。
syn1 = 设置(getSynonyms(word1))
将返回的列表放入集合中将删除重复项。我在这里假设顺序并不重要,因为集合没有定义的顺序。
我正在尝试查找单词的同义词集。这是我的代码:
from nltk.corpus import wordnet as wn
from nltk import pos_tag
def getSynonyms(word1):
synonymList1 = []
for data1 in word1:
wordnetSynset1 = wn.synsets(data1)
tempList1=[]
for synset1 in wordnetSynset1:
synLemmas = synset1.lemma_names()
for i in xrange(len(synLemmas)):
word = synLemmas[i].replace('_',' ')
tempList1.append(pos_tag(word.split()))
synonymList1.append(tempList1)
return synonymList1
word1 = ['study']
syn1 = getSynonyms(word1)
print syn1
这是输出:
[[[(u'survey', 'NN')], [(u'study', 'NN')], [(u'study', 'NN')], [(u'work', 'NN')], [(u'report', 'NN')], [(u'study', 'NN')], [(u'written', 'VBN'), (u'report', 'NN')], [(u'study', 'NN')], [(u'study', 'NN')], [(u'discipline', 'NN')], [(u'subject', 'NN')], [(u'subject', 'JJ'), (u'area', 'NN')], [(u'subject', 'JJ'), (u'field', 'NN')], [(u'field', 'NN')], [(u'field', 'NN'), (u'of', 'IN'), (u'study', 'NN')], [(u'study', 'NN')], [(u'bailiwick', 'NN')], [(u'sketch', 'NN')], [(u'study', 'NN')], [(u'cogitation', 'NN')], [(u'study', 'NN')], [(u'study', 'NN')], [(u'study', 'NN')], [(u'analyze', 'NN')], [(u'analyse', 'NN')], [(u'study', 'NN')], [(u'examine', 'NN')], [(u'canvass', 'NN')], [(u'canvas', 'NN')], [(u'study', 'NN')], [(u'study', 'NN')], [(u'consider', 'VB')], [(u'learn', 'NN')], [(u'study', 'NN')], [(u'read', 'NN')], [(u'take', 'VB')], [(u'study', 'NN')], [(u'hit', 'VB'), (u'the', 'DT'), (u'books', 'NNS')], [(u'study', 'NN')], [(u'meditate', 'NN')], [(u'contemplate', 'NN')]]]
正如我们所见,'study','NN'
出现了不止一次
如何在不重复的情况下为每个同义词只打印一次?
所以每个同义词只用一个同义词表示
在行 tempList1.append(pos_tag(word.split()))
中,而不是始终附加到 for 循环内的列表。您应该检查您尝试添加的元素是否已经存在于列表中。有一个简单的 if 语句检查就可以了。
if pos_tag(word.split()) not in tempList1:
tempList1.append(pos_tag(word.split()))
这是一个不会被添加两次的元素。
syn1 = 设置(getSynonyms(word1))
将返回的列表放入集合中将删除重复项。我在这里假设顺序并不重要,因为集合没有定义的顺序。