在列表中包含同义词的 Synset 函数
Synset function to include synonym in a list
我需要遍历列表并将单词的同义词和下义词添加回列表。例如:
list_of_words = ["bird", "smart", "cool", "happy"]
list_of_words = list_of_words + list_of_words_synonyms + list_of_words_hypnonyms
我能够获取单个单词的同义词和同义词,但需要遍历值列表。
s = wordnet.synset(word)[0]
需要 return 一个列表,其中将各个同义词添加到原始列表中。
预期结果是:
list_of_words = ["bird", "smart", "cool", "happy", "hen", "cock"..鸟的其他同义词, "clever", "intelligent", smart....等的其他同义词]
如何让 synset 函数遍历 list_of_words 并将这些词包含在列表中?我对文本分析很陌生。感谢任何帮助。
EDIT:根据 OP 的评论。输出格式已更改。
假设您想要这样的输出:
result = [
["bird", "smart", "cool", "happy"],
[["Synonym 1 of bird...", ...], ["Synonym 1 of smart...", ...], ["Synonym 1 of cool...", ...], ["Synonym 1 of happy...", ...]],
...
]
新输出格式:
["bird", "smart", "cool", "happy", "synonym of bird", "hyponym of bird", "synonym of smart"... ]
您可以按如下方式遍历原始单词列表:
from pattern.en import wordnet
list_of_words = ["bird", "smart", "cool", "happy"]
original_length = len(list_of_words)
for word in list_of_words:
s = wordnet.synsets(word)[0]
# append synonyms list to the result
list_of_words.append([s.synonyms])
# append hyponyms list to the result
list_of_words.append(s.hyponyms())
迭代后,您可以通过以下方式访问列表:
for index in range(original_length):
print 'Displaying word %s' % list_of_words[index]
print 'Synonyms: %s' % str(list_of_words[index + original_length])
print 'Hyponyms: %s' % str(list_of_words[index + original_length + 1])
这是一个快速实现。不用太担心 fakesynsets,它只是 wordnet.synsets 的模型。你可以直接查看这个函数后面的代码。
def fakesynsets(word):
from collections import namedtuple
sysnset = namedtuple('sysnset', ['synonyms', 'hyponyms'])
return [sysnset(synonyms = [word+'syn'+str(ii) for ii in range(1,3)], hyponyms = lambda : [word+'hyp'+str(ii) for ii in range(1,3)])]
list_of_words = ["bird", "smart", "cool", "happy"]
list_of_words_synonyms = []
list_of_words_hypnonyms = []
for word in list_of_words:
s = fakesynsets(word)[0]
list_of_words_synonyms.extend(s.synonyms)
list_of_words_hypnonyms.extend(s.hyponyms())
list_of_words = list_of_words + list_of_words_synonyms + list_of_words_hypnonyms
print(list_of_words)
(创建这个新答案而不是更新我现有的答案,因为问题已经更新了很多)
通过安装包 "pattern" 并进行调试,最终了解 wordnet.sysets() returns 是什么。这是运行的代码:
from pattern.en import wordnet
list_of_words = [u"bird", u"smart", u"cool", u"happy"]
list_of_words_synonyms = []
list_of_words_hypnonyms = []
for word in list_of_words:
sts = wordnet.synsets(word)
if len(sts):
st = sts[0]
list_of_words_synonyms.extend(st.synonyms)
list_of_words_hypnonyms.extend(hs.senses[0] for hs in st.hyponyms())
list_of_words = list_of_words + list_of_words_synonyms + list_of_words_hypnonyms
print(list_of_words)
请注意:
- 不考虑重复。如果删除重复是一项要求,那么您可以使用 sets.Set 而不是 list
- 对于每个hypnonym,它都有多种含义。 list_of_words_hypnonyms 只包括第一个。如果要包含所有这些,请使用以下代码替换相应的行:
list_of_words_hypnonyms.extend(sense for hs in st.hyponyms() for sense in hs.senses)
- 为list_of_words_hypnonyms添加下位词,使用generator expression
结果是:
[u'bird', u'smart', u'cool', u'happy', u'bird', u'smart', u'smarting', u'smartness', u'cool', u'dickeybird', u'cock', u'hen', u'nester', u'night bird', u'bird of passage', u'protoavis', u'archaeopteryx', u'Sinornis', u'Ibero-mesornis', u'archaeornis', u'ratite', u'carinate', u'passerine', u'nonpasserine bird', u'bird of prey', u'gallinaceous bird', u'parrot', u'cuculiform bird', u'coraciiform bird', u'apodiform bird', u'caprimulgiform bird', u'piciform bird', u'trogon', u'aquatic bird', u'twitterer']
我需要遍历列表并将单词的同义词和下义词添加回列表。例如:
list_of_words = ["bird", "smart", "cool", "happy"]
list_of_words = list_of_words + list_of_words_synonyms + list_of_words_hypnonyms
我能够获取单个单词的同义词和同义词,但需要遍历值列表。
s = wordnet.synset(word)[0]
需要 return 一个列表,其中将各个同义词添加到原始列表中。
预期结果是: list_of_words = ["bird", "smart", "cool", "happy", "hen", "cock"..鸟的其他同义词, "clever", "intelligent", smart....等的其他同义词]
如何让 synset 函数遍历 list_of_words 并将这些词包含在列表中?我对文本分析很陌生。感谢任何帮助。
EDIT:根据 OP 的评论。输出格式已更改。
假设您想要这样的输出:
result = [
["bird", "smart", "cool", "happy"],
[["Synonym 1 of bird...", ...], ["Synonym 1 of smart...", ...], ["Synonym 1 of cool...", ...], ["Synonym 1 of happy...", ...]],
...
]
新输出格式:
["bird", "smart", "cool", "happy", "synonym of bird", "hyponym of bird", "synonym of smart"... ]
您可以按如下方式遍历原始单词列表:
from pattern.en import wordnet
list_of_words = ["bird", "smart", "cool", "happy"]
original_length = len(list_of_words)
for word in list_of_words:
s = wordnet.synsets(word)[0]
# append synonyms list to the result
list_of_words.append([s.synonyms])
# append hyponyms list to the result
list_of_words.append(s.hyponyms())
迭代后,您可以通过以下方式访问列表:
for index in range(original_length):
print 'Displaying word %s' % list_of_words[index]
print 'Synonyms: %s' % str(list_of_words[index + original_length])
print 'Hyponyms: %s' % str(list_of_words[index + original_length + 1])
这是一个快速实现。不用太担心 fakesynsets,它只是 wordnet.synsets 的模型。你可以直接查看这个函数后面的代码。
def fakesynsets(word):
from collections import namedtuple
sysnset = namedtuple('sysnset', ['synonyms', 'hyponyms'])
return [sysnset(synonyms = [word+'syn'+str(ii) for ii in range(1,3)], hyponyms = lambda : [word+'hyp'+str(ii) for ii in range(1,3)])]
list_of_words = ["bird", "smart", "cool", "happy"]
list_of_words_synonyms = []
list_of_words_hypnonyms = []
for word in list_of_words:
s = fakesynsets(word)[0]
list_of_words_synonyms.extend(s.synonyms)
list_of_words_hypnonyms.extend(s.hyponyms())
list_of_words = list_of_words + list_of_words_synonyms + list_of_words_hypnonyms
print(list_of_words)
(创建这个新答案而不是更新我现有的答案,因为问题已经更新了很多)
通过安装包 "pattern" 并进行调试,最终了解 wordnet.sysets() returns 是什么。这是运行的代码:
from pattern.en import wordnet
list_of_words = [u"bird", u"smart", u"cool", u"happy"]
list_of_words_synonyms = []
list_of_words_hypnonyms = []
for word in list_of_words:
sts = wordnet.synsets(word)
if len(sts):
st = sts[0]
list_of_words_synonyms.extend(st.synonyms)
list_of_words_hypnonyms.extend(hs.senses[0] for hs in st.hyponyms())
list_of_words = list_of_words + list_of_words_synonyms + list_of_words_hypnonyms
print(list_of_words)
请注意:
- 不考虑重复。如果删除重复是一项要求,那么您可以使用 sets.Set 而不是 list
- 对于每个hypnonym,它都有多种含义。 list_of_words_hypnonyms 只包括第一个。如果要包含所有这些,请使用以下代码替换相应的行:
list_of_words_hypnonyms.extend(sense for hs in st.hyponyms() for sense in hs.senses)
- 为list_of_words_hypnonyms添加下位词,使用generator expression
结果是:
[u'bird', u'smart', u'cool', u'happy', u'bird', u'smart', u'smarting', u'smartness', u'cool', u'dickeybird', u'cock', u'hen', u'nester', u'night bird', u'bird of passage', u'protoavis', u'archaeopteryx', u'Sinornis', u'Ibero-mesornis', u'archaeornis', u'ratite', u'carinate', u'passerine', u'nonpasserine bird', u'bird of prey', u'gallinaceous bird', u'parrot', u'cuculiform bird', u'coraciiform bird', u'apodiform bird', u'caprimulgiform bird', u'piciform bird', u'trogon', u'aquatic bird', u'twitterer']