NLTK 和 Pandas - 将同义词集添加到列表中
NLTK and Pandas - adding synsets into a list
我想创建一个列表作为新行添加到数据框中。
import nltk
import pandas as pd
from nltk.corpus import wordnet
import pandas as pd
import numpy as np
Overviewdataframe = pd.DataFrame([])
synonyms = []
for syn in wordnet.synsets("active"):
for l in syn.lemmas():
synonyms.append(l.name())
Overviewdataframe = Overviewdataframe.append(synonyms)
synonyms = []
而是将行添加为列。你能帮帮我吗?
谢谢。
TL;DR
from itertools import chain
import pandas as pd
from nltk.corpus import wordnet as wn
wordlist = ['active', 'fan', 'hop', 'grace']
words2lemmanames = [{'word': word, 'synset':ss.name(), 'lemma_names':ss.lemma_names()}
for word in wordlist for ss in wn.synsets(word)]
pd.DataFrame(words2lemmanames)
中龙
在NLTK中查询WordNet接口时,查询一个词returns一个"concept"又名"synset"
>>> wn.synsets('active')
[Synset('active_agent.n.01'), Synset('active_voice.n.01'), Synset('active.n.03'), Synset('active.a.01'), Synset('active.s.02'), Synset('active.a.03'), Synset('active.s.04'), Synset('active.a.05'), Synset('active.a.06'), Synset('active.a.07'), Synset('active.s.08'), Synset('active.a.09'), Synset('active.a.10'), Synset('active.a.11'), Synset('active.a.12'), Synset('active.a.13'), Synset('active.a.14')]
每个同义词集都有自己的词条名称列表,即
>>> wn.synsets('active')[0].lemma_names()
['active_agent', 'active']
您也可以直接使用他们的 "name" 访问同义词集,"name" 的通常约定是 (i) 第一个引理名称然后点 (ii) POS 标记和点 (ii)索引号。
>>> wn.synsets('active')[0] == wn.synset('active_agent.n.01')
True
最后,给定键值对列表(即字典对象),您可以将其输入 pandas.DataFrame
以将其转换为数据框。
我想创建一个列表作为新行添加到数据框中。
import nltk
import pandas as pd
from nltk.corpus import wordnet
import pandas as pd
import numpy as np
Overviewdataframe = pd.DataFrame([])
synonyms = []
for syn in wordnet.synsets("active"):
for l in syn.lemmas():
synonyms.append(l.name())
Overviewdataframe = Overviewdataframe.append(synonyms)
synonyms = []
而是将行添加为列。你能帮帮我吗?
谢谢。
TL;DR
from itertools import chain
import pandas as pd
from nltk.corpus import wordnet as wn
wordlist = ['active', 'fan', 'hop', 'grace']
words2lemmanames = [{'word': word, 'synset':ss.name(), 'lemma_names':ss.lemma_names()}
for word in wordlist for ss in wn.synsets(word)]
pd.DataFrame(words2lemmanames)
中龙
在NLTK中查询WordNet接口时,查询一个词returns一个"concept"又名"synset"
>>> wn.synsets('active')
[Synset('active_agent.n.01'), Synset('active_voice.n.01'), Synset('active.n.03'), Synset('active.a.01'), Synset('active.s.02'), Synset('active.a.03'), Synset('active.s.04'), Synset('active.a.05'), Synset('active.a.06'), Synset('active.a.07'), Synset('active.s.08'), Synset('active.a.09'), Synset('active.a.10'), Synset('active.a.11'), Synset('active.a.12'), Synset('active.a.13'), Synset('active.a.14')]
每个同义词集都有自己的词条名称列表,即
>>> wn.synsets('active')[0].lemma_names()
['active_agent', 'active']
您也可以直接使用他们的 "name" 访问同义词集,"name" 的通常约定是 (i) 第一个引理名称然后点 (ii) POS 标记和点 (ii)索引号。
>>> wn.synsets('active')[0] == wn.synset('active_agent.n.01')
True
最后,给定键值对列表(即字典对象),您可以将其输入 pandas.DataFrame
以将其转换为数据框。