为什么对齐的单词列表打印重复?
Why does the aligned list of words printed duplicate?
我正在尝试使用 NLTK wordnet synsets
.
来实现 Sultan Monolingual Aligner 查找同义词集
我有两个列表:
word1 = ['move', 'buy','learn']
word2 = ['study', 'purchase']
根据对齐规则,如果 word1
的 word1[i]
的同义词集与 word2
的 word2[j]
的同义词集相似,则 word1[i]
和 word2[j]
将对齐。
这是我的代码:
from nltk.corpus import wordnet as wn
def getSynonyms(word):
synonymList1 = []
wordnetSynset1 = wn.synsets(word)
tempList1=[]
for synset1 in wordnetSynset1:
synLemmas = synset1.lemma_names()
for i in xrange(len(synLemmas)):
word = synLemmas[i].replace('_',' ')
if word not in tempList1:
tempList1.append(word)
synonymList1.append(tempList1)
return synonymList1
def cekSynonyms(word1, word2):
newlist = []
for i in xrange(len(word1)):
for j in xrange(len(word2)):
getsyn1 = getSynonyms(word1[i])
getsyn2 = getSynonyms(word2[j])
ds1 = [x for y in getsyn1 for x in y]
ds2 = [x for y in getsyn2 for x in y]
print ds1,"---align to--->",ds2,"\n"
for k in xrange(len(ds1)):
for l in xrange(len(ds2)):
if ds1[k] == ds2[l]:
#newsim = [ds1[k], ds2[l]]
newsim = [word1[i], word2[j]]
newlist.append(newsim)
return newlist
word1 = ['move', 'buy','learn']
word2 = ['study', 'purchase']
print cekSynonyms(word1, word2)
是的,我可以找到每个单词的同义词集。这是输出:
[u'move', u'relocation', u'motion', u'movement', u'motility', u'travel', u'go', u'locomote', u'displace', u'proceed', u'be active', u'act', u'affect', u'impress', u'strike', u'motivate', u'actuate', u'propel', u'prompt', u'incite', u'run', u'make a motion'] ---align to---> [u'survey', u'study', u'work', u'report', u'written report', u'discipline', u'subject', u'subject area', u'subject field', u'field', u'field of study', u'bailiwick', u'sketch', u'cogitation', u'analyze', u'analyse', u'examine', u'canvass', u'canvas', u'consider', u'learn', u'read', u'take', u'hit the books', u'meditate', u'contemplate']
[u'move', u'relocation', u'motion', u'movement', u'motility', u'travel', u'go', u'locomote', u'displace', u'proceed', u'be active', u'act', u'affect', u'impress', u'strike', u'motivate', u'actuate', u'propel', u'prompt', u'incite', u'run', u'make a motion'] ---align to---> [u'purchase', u'leverage', u'buy']
[u'bargain', u'buy', u'steal', u'purchase', u'bribe', u'corrupt', u"grease one's palms"] ---align to---> [u'survey', u'study', u'work', u'report', u'written report', u'discipline', u'subject', u'subject area', u'subject field', u'field', u'field of study', u'bailiwick', u'sketch', u'cogitation', u'analyze', u'analyse', u'examine', u'canvass', u'canvas', u'consider', u'learn', u'read', u'take', u'hit the books', u'meditate', u'contemplate']
[u'bargain', u'buy', u'steal', u'purchase', u'bribe', u'corrupt', u"grease one's palms"] ---align to---> [u'purchase', u'leverage', u'buy']
[u'learn', u'larn', u'acquire', u'hear', u'get word', u'get wind', u'pick up', u'find out', u'get a line', u'discover', u'see', u'memorize', u'memorise', u'con', u'study', u'read', u'take', u'teach', u'instruct', u'determine', u'check', u'ascertain', u'watch'] ---align to---> [u'survey', u'study', u'work', u'report', u'written report', u'discipline', u'subject', u'subject area', u'subject field', u'field', u'field of study', u'bailiwick', u'sketch', u'cogitation', u'analyze', u'analyse', u'examine', u'canvass', u'canvas', u'consider', u'learn', u'read', u'take', u'hit the books', u'meditate', u'contemplate']
[u'learn', u'larn', u'acquire', u'hear', u'get word', u'get wind', u'pick up', u'find out', u'get a line', u'discover', u'see', u'memorize', u'memorise', u'con', u'study', u'read', u'take', u'teach', u'instruct', u'determine', u'check', u'ascertain', u'watch'] ---align to---> [u'purchase', u'leverage', u'buy']
[['buy', 'purchase'], ['buy', 'purchase'], ['learn', 'study'], ['learn', 'study'], ['learn', 'study'], ['learn', 'study']]
上面的 6 行是 word1
和 word2
中的每个单词,它们正在通过它们的同义词集进行比较。
最后一行是对齐的单词。
从同义词集可以看出,['buy','purchase']
和 ['learn','study']
是对齐的词。
但是为什么输出打印了不止一次?像这样 >> [['buy', 'purchase'], ['buy', 'purchase'], ['learn', 'study'], ['learn', 'study'], ['learn', 'study'], ['learn', 'study']]
如何只打印一次不重复?像这样 >> [['buy','purchase'], ['learn','study']]
您可以通过将此类列表转换为集合来删除重复项,但由于列表不可散列,因此您必须在途中遍历元组:
a = [['buy', 'purchase'], ['buy', 'purchase'], ['learn', 'study'], \
['learn', 'study'], ['learn', 'study'], ['learn', 'study']]
a = [list(x) for x in set([tuple(x) for x in a])]
print(a)
给出:
[['buy', 'purchase'], ['learn', 'study']]
基于先生。 nbubis
回答,这里我写了一个元组函数:
def tupleSynonyms(word1, word2):
a = cekSynonyms(word1, word2)
anew = [list(x) for x in set([tuple(x) for x in a])]
return anew
print tupleSynonyms(word1, word2)
我正在尝试使用 NLTK wordnet synsets
.
我有两个列表:
word1 = ['move', 'buy','learn']
word2 = ['study', 'purchase']
根据对齐规则,如果 word1
的 word1[i]
的同义词集与 word2
的 word2[j]
的同义词集相似,则 word1[i]
和 word2[j]
将对齐。
这是我的代码:
from nltk.corpus import wordnet as wn
def getSynonyms(word):
synonymList1 = []
wordnetSynset1 = wn.synsets(word)
tempList1=[]
for synset1 in wordnetSynset1:
synLemmas = synset1.lemma_names()
for i in xrange(len(synLemmas)):
word = synLemmas[i].replace('_',' ')
if word not in tempList1:
tempList1.append(word)
synonymList1.append(tempList1)
return synonymList1
def cekSynonyms(word1, word2):
newlist = []
for i in xrange(len(word1)):
for j in xrange(len(word2)):
getsyn1 = getSynonyms(word1[i])
getsyn2 = getSynonyms(word2[j])
ds1 = [x for y in getsyn1 for x in y]
ds2 = [x for y in getsyn2 for x in y]
print ds1,"---align to--->",ds2,"\n"
for k in xrange(len(ds1)):
for l in xrange(len(ds2)):
if ds1[k] == ds2[l]:
#newsim = [ds1[k], ds2[l]]
newsim = [word1[i], word2[j]]
newlist.append(newsim)
return newlist
word1 = ['move', 'buy','learn']
word2 = ['study', 'purchase']
print cekSynonyms(word1, word2)
是的,我可以找到每个单词的同义词集。这是输出:
[u'move', u'relocation', u'motion', u'movement', u'motility', u'travel', u'go', u'locomote', u'displace', u'proceed', u'be active', u'act', u'affect', u'impress', u'strike', u'motivate', u'actuate', u'propel', u'prompt', u'incite', u'run', u'make a motion'] ---align to---> [u'survey', u'study', u'work', u'report', u'written report', u'discipline', u'subject', u'subject area', u'subject field', u'field', u'field of study', u'bailiwick', u'sketch', u'cogitation', u'analyze', u'analyse', u'examine', u'canvass', u'canvas', u'consider', u'learn', u'read', u'take', u'hit the books', u'meditate', u'contemplate']
[u'move', u'relocation', u'motion', u'movement', u'motility', u'travel', u'go', u'locomote', u'displace', u'proceed', u'be active', u'act', u'affect', u'impress', u'strike', u'motivate', u'actuate', u'propel', u'prompt', u'incite', u'run', u'make a motion'] ---align to---> [u'purchase', u'leverage', u'buy']
[u'bargain', u'buy', u'steal', u'purchase', u'bribe', u'corrupt', u"grease one's palms"] ---align to---> [u'survey', u'study', u'work', u'report', u'written report', u'discipline', u'subject', u'subject area', u'subject field', u'field', u'field of study', u'bailiwick', u'sketch', u'cogitation', u'analyze', u'analyse', u'examine', u'canvass', u'canvas', u'consider', u'learn', u'read', u'take', u'hit the books', u'meditate', u'contemplate']
[u'bargain', u'buy', u'steal', u'purchase', u'bribe', u'corrupt', u"grease one's palms"] ---align to---> [u'purchase', u'leverage', u'buy']
[u'learn', u'larn', u'acquire', u'hear', u'get word', u'get wind', u'pick up', u'find out', u'get a line', u'discover', u'see', u'memorize', u'memorise', u'con', u'study', u'read', u'take', u'teach', u'instruct', u'determine', u'check', u'ascertain', u'watch'] ---align to---> [u'survey', u'study', u'work', u'report', u'written report', u'discipline', u'subject', u'subject area', u'subject field', u'field', u'field of study', u'bailiwick', u'sketch', u'cogitation', u'analyze', u'analyse', u'examine', u'canvass', u'canvas', u'consider', u'learn', u'read', u'take', u'hit the books', u'meditate', u'contemplate']
[u'learn', u'larn', u'acquire', u'hear', u'get word', u'get wind', u'pick up', u'find out', u'get a line', u'discover', u'see', u'memorize', u'memorise', u'con', u'study', u'read', u'take', u'teach', u'instruct', u'determine', u'check', u'ascertain', u'watch'] ---align to---> [u'purchase', u'leverage', u'buy']
[['buy', 'purchase'], ['buy', 'purchase'], ['learn', 'study'], ['learn', 'study'], ['learn', 'study'], ['learn', 'study']]
上面的 6 行是 word1
和 word2
中的每个单词,它们正在通过它们的同义词集进行比较。
最后一行是对齐的单词。
从同义词集可以看出,['buy','purchase']
和 ['learn','study']
是对齐的词。
但是为什么输出打印了不止一次?像这样 >> [['buy', 'purchase'], ['buy', 'purchase'], ['learn', 'study'], ['learn', 'study'], ['learn', 'study'], ['learn', 'study']]
如何只打印一次不重复?像这样 >> [['buy','purchase'], ['learn','study']]
您可以通过将此类列表转换为集合来删除重复项,但由于列表不可散列,因此您必须在途中遍历元组:
a = [['buy', 'purchase'], ['buy', 'purchase'], ['learn', 'study'], \
['learn', 'study'], ['learn', 'study'], ['learn', 'study']]
a = [list(x) for x in set([tuple(x) for x in a])]
print(a)
给出:
[['buy', 'purchase'], ['learn', 'study']]
基于先生。 nbubis
回答,这里我写了一个元组函数:
def tupleSynonyms(word1, word2):
a = cekSynonyms(word1, word2)
anew = [list(x) for x in set([tuple(x) for x in a])]
return anew
print tupleSynonyms(word1, word2)