使用词向量时出现 ValueError（无法广播）：如何解决？

Question

我正在尝试为一个项目制作一个聊天机器人，我正在使用 spaCy。我正在学习一个教程，我需要创建一个二维数组 X，其中的行数与我的数据集中的句子数一样多。每行都是一个描述句子的词向量。但是，当我尝试制作此数组时出现错误。我不太确定是什么原因造成的，因为我一般都是 spaCy 和 NLP 的新手。

我试图从文档中找出问题所在。我也查看了 Stack Overflow，但找不到任何可以解释我的问题的内容。

import spacy
import numpy
#load spacy nlp model
nlp = spacy.load("en_core_web_sm")

#calculate the length of my sentences dataset
n_sentences = len(sentences)
#calculate the dimensionality of nlp model
embedding_dim = nlp.vocab.vectors_length
#X is a 2D array with as many rows as there are sentences in my dataset
#Each row is a vector describing the sentence
#initialise array with zeros
X = numpy.zeros((n_sentences, embedding_dim))
#iterate over sentences
for idx, sentence in enumerate(sentences):
   #pass each sentence to nlp object to create document
   doc = nlp(sentence)
   print(doc.vector.shape)
   #save document's .vector attribute to corresponding row in X
   X[idx, :] = doc.vector

据我所知，这是抛出错误的最后一行。

ValueError: could not broadcast input array from shape (96) into shape (1,0)

我不知道是什么原因造成的，因为我对 numpy 数组和数组形状不是很熟悉。我的数据集，句子，是一个简单的字符串列表。我期望最终得到一个包含词向量的二维数组。我遵循的教程说代码是正确的，所以我不确定为什么它对我不起作用，我想我一定是错过了什么。

这是一个学术（A-Level）项目。

Answer 1

en_core_web_sm 模型不包含词向量。您可以下载 en_core_web_md 或 en_core_web_lg models 来代替。

Reference

nlp = spacy.load("en_core_web_md")
print (nlp.vocab.vectors_length)

输出：

使用词向量时出现 ValueError（无法广播）：如何解决？

ValueError (could not broadcast) when using word vectors: how to fix?

python

arrays

numpy

spacy

valueerror