使用手套向量比较两个语句之间的相似性时的关键错误
Key error while comparing the similarity between two statements using glove vectors
老实说,我是 NLP 的新手,我正在尝试使用 GLOVE 向量来查找两个语句之间的相似性,但我遇到了一个关键错误。请让我知道我哪里错了。
预先感谢您的帮助,如果有其他更好的方法来衡量语句之间的相似性,请告诉我。
gloveFile = "/content/glove.6B.50d.txt"
import numpy as np
def loadGloveModel(gloveFile):
print ("Loading Glove Model")
with open(gloveFile, encoding="utf8" ) as f:
content = f.readlines()
print(content)
model = {}
for line in content:
splitLine = line.split()
word = splitLine[0]
embedding = np.array([float(val) for val in splitLine[1:]])
model[word] = embedding
print ("Done.",len(model)," words loaded!")
return model
import re
from nltk.corpus import stopwords
import pandas as pd
def preprocess(raw_text):
# keep only words
letters_only_text = re.sub("[^a-zA-Z]", " ", raw_text)
# convert to lower case and split
words = letters_only_text.lower().split()
# remove stopwords
stopword_set = set(stopwords.words("english"))
cleaned_words = list(set([w for w in words if w not in stopword_set]))
return cleaned_words
def cosine_distance_wordembedding_method(s1, s2):
import scipy
vector_1 = np.mean([model[word] for word in preprocess(s1)],axis=0)
vector_2 = np.mean([model[word] for word in preprocess(s2)],axis=0)
cosine = scipy.spatial.distance.cosine(vector_1, vector_2)
print('Word Embedding method with a cosine distance asses that our two sentences are similar to',round((1-cosine)*100,2),'%')
model = loadGloveModel(gloveFile)
for i in list121:
cosine_distance_wordembedding_method(str4,i)
然后我得到如下错误:
<ipython-input-54-d463b41223c3> in cosine_distance_wordembedding_method(s1, s2)
36 import scipy
37 vector_1 = np.mean([model[word] for word in preprocess(s1)],axis=0)
---> 38 vector_2 = np.mean([model[word] for word in preprocess(s2)],axis=0)
39 cosine = scipy.spatial.distance.cosine(vector_1, vector_2)
40 print('Word Embedding method with a cosine distance asses that our two sentences are similar to',round((1-cosine)*100,2),'%')
<ipython-input-54-d463b41223c3> in <listcomp>(.0)
36 import scipy
37 vector_1 = np.mean([model[word] for word in preprocess(s1)],axis=0)
---> 38 vector_2 = np.mean([model[word] for word in preprocess(s2)],axis=0)
39 cosine = scipy.spatial.distance.cosine(vector_1, vector_2)
40 print('Word Embedding method with a cosine distance asses that our two sentences are similar to',round((1-cosine)*100,2),'%')
KeyError: 'vehcile'
我发现了我的错误,我只是保留这个问题以便有人可以得到帮助。
我犯的错误是我输入了错误的拼写,例如“Vehcile”而不是“vehicle”。
老实说,我是 NLP 的新手,我正在尝试使用 GLOVE 向量来查找两个语句之间的相似性,但我遇到了一个关键错误。请让我知道我哪里错了。 预先感谢您的帮助,如果有其他更好的方法来衡量语句之间的相似性,请告诉我。
gloveFile = "/content/glove.6B.50d.txt"
import numpy as np
def loadGloveModel(gloveFile):
print ("Loading Glove Model")
with open(gloveFile, encoding="utf8" ) as f:
content = f.readlines()
print(content)
model = {}
for line in content:
splitLine = line.split()
word = splitLine[0]
embedding = np.array([float(val) for val in splitLine[1:]])
model[word] = embedding
print ("Done.",len(model)," words loaded!")
return model
import re
from nltk.corpus import stopwords
import pandas as pd
def preprocess(raw_text):
# keep only words
letters_only_text = re.sub("[^a-zA-Z]", " ", raw_text)
# convert to lower case and split
words = letters_only_text.lower().split()
# remove stopwords
stopword_set = set(stopwords.words("english"))
cleaned_words = list(set([w for w in words if w not in stopword_set]))
return cleaned_words
def cosine_distance_wordembedding_method(s1, s2):
import scipy
vector_1 = np.mean([model[word] for word in preprocess(s1)],axis=0)
vector_2 = np.mean([model[word] for word in preprocess(s2)],axis=0)
cosine = scipy.spatial.distance.cosine(vector_1, vector_2)
print('Word Embedding method with a cosine distance asses that our two sentences are similar to',round((1-cosine)*100,2),'%')
model = loadGloveModel(gloveFile)
for i in list121:
cosine_distance_wordembedding_method(str4,i)
然后我得到如下错误:
<ipython-input-54-d463b41223c3> in cosine_distance_wordembedding_method(s1, s2)
36 import scipy
37 vector_1 = np.mean([model[word] for word in preprocess(s1)],axis=0)
---> 38 vector_2 = np.mean([model[word] for word in preprocess(s2)],axis=0)
39 cosine = scipy.spatial.distance.cosine(vector_1, vector_2)
40 print('Word Embedding method with a cosine distance asses that our two sentences are similar to',round((1-cosine)*100,2),'%')
<ipython-input-54-d463b41223c3> in <listcomp>(.0)
36 import scipy
37 vector_1 = np.mean([model[word] for word in preprocess(s1)],axis=0)
---> 38 vector_2 = np.mean([model[word] for word in preprocess(s2)],axis=0)
39 cosine = scipy.spatial.distance.cosine(vector_1, vector_2)
40 print('Word Embedding method with a cosine distance asses that our two sentences are similar to',round((1-cosine)*100,2),'%')
KeyError: 'vehcile'
我发现了我的错误,我只是保留这个问题以便有人可以得到帮助。 我犯的错误是我输入了错误的拼写,例如“Vehcile”而不是“vehicle”。