使用 NLTK 同义词创建数据框
Create a dataframe with NLTK synonyms
早上好,
我正在使用 NLTK 从单词框架中获取同义词。
print(df)
col_1 col_2
Book 5
Pen 5
Pencil 6
def get_synonyms(df, column_name):
df_1 = df["col_1"]
for i in df_1:
syn = wn.synsets(i)
for synset in list(wn.all_synsets('n'))[:2]:
print(i, "-->", synset)
print("-----------")
for lemma in synset.lemmas():
print(lemma.name())
ci = lemma.name()
return(syn)
它确实有效,但我想获得以下数据框,其中包含 "col_1" 中每个单词的前 "n" 个同义词:
print(df_final)
col_1 synonym
Book booklet
Book album
Pen cage
...
我尝试在同义词集和引理循环之前初始化一个空列表,然后追加,但在这两种情况下都不起作用;例如:
synonyms = []
for lemma in synset.lemmas():
print(lemma.name())
ci = lemma.name()
synonyms.append(ci)
您可以使用:
from nltk.corpus import wordnet
from itertools import chain
def get_synonyms(df, column_name, N):
L = []
for i in df[column_name]:
syn = wordnet.synsets(i)
#flatten all lists by chain, remove duplicates by set
lemmas = list(set(chain.from_iterable([w.lemma_names() for w in syn])))
for j in lemmas[:N]:
#append to final list
L.append([i, j])
#create DataFrame
return (pd.DataFrame(L, columns=['word','syn']))
#add number of filtered synonyms
df1 = get_synonyms(df, 'col_1', 3)
print (df1)
word syn
0 Book record_book
1 Book book
2 Book Word
3 Pen penitentiary
4 Pen compose
5 Pen pen
6 Pencil pencil
早上好,
我正在使用 NLTK 从单词框架中获取同义词。
print(df)
col_1 col_2
Book 5
Pen 5
Pencil 6
def get_synonyms(df, column_name):
df_1 = df["col_1"]
for i in df_1:
syn = wn.synsets(i)
for synset in list(wn.all_synsets('n'))[:2]:
print(i, "-->", synset)
print("-----------")
for lemma in synset.lemmas():
print(lemma.name())
ci = lemma.name()
return(syn)
它确实有效,但我想获得以下数据框,其中包含 "col_1" 中每个单词的前 "n" 个同义词:
print(df_final)
col_1 synonym
Book booklet
Book album
Pen cage
...
我尝试在同义词集和引理循环之前初始化一个空列表,然后追加,但在这两种情况下都不起作用;例如:
synonyms = []
for lemma in synset.lemmas():
print(lemma.name())
ci = lemma.name()
synonyms.append(ci)
您可以使用:
from nltk.corpus import wordnet
from itertools import chain
def get_synonyms(df, column_name, N):
L = []
for i in df[column_name]:
syn = wordnet.synsets(i)
#flatten all lists by chain, remove duplicates by set
lemmas = list(set(chain.from_iterable([w.lemma_names() for w in syn])))
for j in lemmas[:N]:
#append to final list
L.append([i, j])
#create DataFrame
return (pd.DataFrame(L, columns=['word','syn']))
#add number of filtered synonyms
df1 = get_synonyms(df, 'col_1', 3)
print (df1)
word syn
0 Book record_book
1 Book book
2 Book Word
3 Pen penitentiary
4 Pen compose
5 Pen pen
6 Pencil pencil