使用手套向量比较两个语句之间的相似性时的关键错误

Question

老实说，我是 NLP 的新手，我正在尝试使用 GLOVE 向量来查找两个语句之间的相似性，但我遇到了一个关键错误。请让我知道我哪里错了。预先感谢您的帮助，如果有其他更好的方法来衡量语句之间的相似性，请告诉我。

gloveFile = "/content/glove.6B.50d.txt"
import numpy as np
def loadGloveModel(gloveFile):
    print ("Loading Glove Model")
    with open(gloveFile, encoding="utf8" ) as f:
        content = f.readlines()
        print(content)
    model = {}
    for line in content:
        splitLine = line.split()
        word = splitLine[0]
        embedding = np.array([float(val) for val in splitLine[1:]])
        model[word] = embedding
    print ("Done.",len(model)," words loaded!")
    return model

import re
from nltk.corpus import stopwords
import pandas as pd

def preprocess(raw_text):

    # keep only words
    letters_only_text = re.sub("[^a-zA-Z]", " ", raw_text)

    # convert to lower case and split 
    words = letters_only_text.lower().split()

    # remove stopwords
    stopword_set = set(stopwords.words("english"))
    cleaned_words = list(set([w for w in words if w not in stopword_set]))

    return cleaned_words

def cosine_distance_wordembedding_method(s1, s2):
    import scipy
    vector_1 = np.mean([model[word] for word in preprocess(s1)],axis=0)
    vector_2 = np.mean([model[word] for word in preprocess(s2)],axis=0)
    cosine = scipy.spatial.distance.cosine(vector_1, vector_2)
    print('Word Embedding method with a cosine distance asses that our two sentences are similar to',round((1-cosine)*100,2),'%')

model = loadGloveModel(gloveFile)
for i in list121:
  cosine_distance_wordembedding_method(str4,i)

然后我得到如下错误：

<ipython-input-54-d463b41223c3> in cosine_distance_wordembedding_method(s1, s2)
     36     import scipy
     37     vector_1 = np.mean([model[word] for word in preprocess(s1)],axis=0)
---> 38     vector_2 = np.mean([model[word] for word in preprocess(s2)],axis=0)
     39     cosine = scipy.spatial.distance.cosine(vector_1, vector_2)
     40     print('Word Embedding method with a cosine distance asses that our two sentences are similar to',round((1-cosine)*100,2),'%')

<ipython-input-54-d463b41223c3> in <listcomp>(.0)
     36     import scipy
     37     vector_1 = np.mean([model[word] for word in preprocess(s1)],axis=0)
---> 38     vector_2 = np.mean([model[word] for word in preprocess(s2)],axis=0)
     39     cosine = scipy.spatial.distance.cosine(vector_1, vector_2)
     40     print('Word Embedding method with a cosine distance asses that our two sentences are similar to',round((1-cosine)*100,2),'%')

KeyError: 'vehcile'

Answer 1

我发现了我的错误，我只是保留这个问题以便有人可以得到帮助。我犯的错误是我输入了错误的拼写，例如“Vehcile”而不是“vehicle”。

使用手套向量比较两个语句之间的相似性时的关键错误

Key error while comparing the similarity between two statements using glove vectors

nlp

similarity

stanford-nlp