为什么 Tensorflow 的 sampled_softmax_loss 强制您使用偏差，而专家建议不要对 Word2Vec 使用偏差？

Question

我看到的Word2Vec的所有tensorflow实现在负采样softmax函数上都有偏差，包括在tensorflow官网上

https://www.tensorflow.org/tutorials/word2vec#vector-representations-of-words

loss = tf.reduce_mean(
  tf.nn.nce_loss(weights=nce_weights,
                 biases=nce_biases,
                 labels=train_labels,
                 inputs=embed,
                 num_sampled=num_sampled,
                 num_classes=vocabulary_size))

这是来自 Google 的免费深度学习课程 https://github.com/tensorflow/tensorflow/blob/master/tensorflow/examples/udacity/5_word2vec.ipynb

 loss = tf.reduce_mean(
    tf.nn.sampled_softmax_loss(weights=softmax_weights, biases=softmax_biases, inputs=embed,
                               labels=train_labels, num_sampled=num_sampled, num_classes=vocabulary_size))

然而，从 Andrew Ng 和 Richard Socher 的讲座来看，他们的负采样 softmax 中没有包含偏差。

即使在这个想法的起源地，米科洛夫也指出：

biases are not used in the neural network, as no significant improvement of performance was observed - following the Occam's razor, the solution is as simple as it needs to be.

Mikolov, T.：基于神经网络的统计语言模型，p. 29 http://www.fit.vutbr.cz/~imikolov/rnnlm/thesis.pdf

那么为什么官方的 tensorflow 实现有偏差，为什么似乎没有选项可以不在 sampled_softmax_loss 函数中包含偏差？

Answer 1

exercise you link 定义 softmax_biases 为零：

softmax_biases = tf.Variable(tf.zeros([vocabulary_size]))

也就是说：他们在 word2vec 示例中没有使用任何实际偏差。

sampled_softmax_loss() 函数是通用的，用于许多神经网络；它要求 biases 参数的决定与最适合某个特定神经网络应用程序 (word2vec) 的参数无关，并且通过允许（如此处）全零来适应 word2vec 情况。

为什么 Tensorflow 的 sampled_softmax_loss 强制您使用偏差，而专家建议不要对 Word2Vec 使用偏差？

Why does Tensorflow's sampled_softmax_loss force you to use a bias, when experts recommend no bias be used for Word2Vec?

word2vec

deep-learning

tensorflow

word-embedding