Word2vec 中 Softmax 层的权重形状（skip-gram）

Shape of weights in the Softmax layer in Word2vec(skip-gram)

我对 Softmax 层的权重形状有疑问。

假设我们的词汇量是 10000 个单词，我们的嵌入层将降维到 300。

所以输入是一个长度为 10000 的单热向量，嵌入层有 300 个神经元。这意味着，从输入层到嵌入层的权重矩阵的形状为 10000*300（词汇表中的单词数*嵌入层中的神经元）。

根据本教程（https://www.kaggle.com/christofer/word2vec-skipgram-model-with-tensorflow）和许多其他教程，下一个权重矩阵（连接嵌入层和 Softmax classifier）具有相同的形状（词汇中的单词数量*嵌入层或在我们的例子中为 10000 * 300）。我不明白为什么？不应该是 300 * 10000（因为我们必须为每个 class 预测 10000 个概率）？

你能解释一下吗？

因为tf.nn.sampled_softmax_loss函数。此函数的设计方式需要权重矩阵具有 [vocabulary size, dim].

的形状

根据文档，

weights: A Tensor of shape [num_classes, dim], or a list of Tensor objects whose concatenation along dimension 0 has shape [num_classes, dim]. The (possibly-sharded) class embeddings.

为什么会这样？

sampled_softmax_loss 的工作方式是对属于输出节点子集的权重进行采样，这些输出节点将在每次迭代中进行优化（即不对所有输出节点的权重进行运行优化）。完成的方式是使用 embedding_lookup。因此，权重的形状 [vocab_size, dim] 非常适合此目的。

Word2vec 中 Softmax 层的权重形状（skip-gram）

Shape of weights in the Softmax layer in Word2vec(skip-gram)

nlp

word2vec

deep-learning

tensorflow

softmax

为什么会这样？