神经网络中的投影层是什么?
What is a projection layer in the context of neural networks?
我目前正在尝试了解 word2vec 神经网络学习算法背后的架构,用于根据上下文将单词表示为向量。
阅读Tomas Mikolov paper后,我发现了他定义的投影层。尽管这个术语在提到 word2vec 时被广泛使用,但我找不到它在神经网络上下文中的确切定义。
我的问题是,在神经网络上下文中,什么是投影层?它是赋予隐藏层的名称,其与先前节点的链接共享相同的权重吗?它的单元实际上有某种激活函数吗?
另一个更广泛地涉及该问题的资源可以在 this tutorial 中找到,它还涉及第 67 页左右的 投影层。
continuous bag of words
用于根据其先前和未来条目预测单个单词:因此它是上下文结果。
输入是根据先前和未来的条目计算的权重:并且所有条目都被赋予相同的新权重:因此该模型的复杂性/特征数量比许多其他 NN 架构小得多。
回复:what is the projection layer
:来自您引用的论文
the non-linear hidden layer is removed and the projection layer is
shared for all words (not just the projection matrix); thus, all words
get projected into the same position (their vectors are averaged).
所以projection层是单组的shared weights
并且没有表示激活函数
Note that the weight matrix between the input and the projection layer
is shared for all word positions in the same way as in the NNLM
因此 hidden layer
实际上由这一组共享权重表示 - 正如您正确暗示的那样,所有输入节点都是相同的。
The projection layer maps the discrete word indices of an n-gram context to a continuous vector space.
如本文所述thesis
The projection layer is shared such that for contexts containing the same word multiple times, the same set of weights is applied to form each part of the projection vector.
This organization effectively increases the amount of data available for training the projection layer weights since each word of each context training pattern individually contributes changes to the weight values.
此图显示了如何通过复制投影层权重矩阵中的列来有效组装投影层的输出的简单拓扑。
现在,隐藏层:
The hidden layer processes the output of the projection layer and is also created with a
number of neurons specified in the topology configuration file.
编辑:对图表中发生的事情的解释
Each neuron in the projection layer is represented by a number of weights equal to the size of the vocabulary. The projection layer differs from the hidden and output layers by not using a non-linear activation function. Its purpose is simply to provide an efficient means of projecting the given n- gram context onto a reduced continuous vector space for subsequent processing by hidden and output layers trained to classify such vectors. Given the one-or-zero nature of the input vector elements, the output for a particular word with index i is simply the ith column of the trained matrix of projection layer weights (where each row of the matrix represents the weights of a single neuron).
我发现这里之前的答案有点过于复杂 - 投影层只是一个简单的矩阵乘法,或者在 NN 的上下文中,一个 regular/dense/linear 层,最后没有非线性激活( sigmoid/tanh/relu/etc.) 这个想法是投影(例如)100K 维离散向量到 600 维连续向量(我在这里随机选择数字,“你的里程可能各不相同”)。确切的矩阵参数是通过训练过程学习到的。
会发生什么 before/after 已经取决于模型和上下文,而不是 OP 所要求的。
(在 practice 中,你甚至不会理会矩阵乘法(因为你乘以一个 1-hot 向量,它的单词索引为 1,其他地方为 0),并且会对待训练有素的矩阵作为监视 table(即语料库中的第 6257 个单词 = 投影矩阵中的第 6257 个 row/column(取决于您如何定义它))。
我目前正在尝试了解 word2vec 神经网络学习算法背后的架构,用于根据上下文将单词表示为向量。
阅读Tomas Mikolov paper后,我发现了他定义的投影层。尽管这个术语在提到 word2vec 时被广泛使用,但我找不到它在神经网络上下文中的确切定义。
我的问题是,在神经网络上下文中,什么是投影层?它是赋予隐藏层的名称,其与先前节点的链接共享相同的权重吗?它的单元实际上有某种激活函数吗?
另一个更广泛地涉及该问题的资源可以在 this tutorial 中找到,它还涉及第 67 页左右的 投影层。
continuous bag of words
用于根据其先前和未来条目预测单个单词:因此它是上下文结果。
输入是根据先前和未来的条目计算的权重:并且所有条目都被赋予相同的新权重:因此该模型的复杂性/特征数量比许多其他 NN 架构小得多。
回复:what is the projection layer
:来自您引用的论文
the non-linear hidden layer is removed and the projection layer is shared for all words (not just the projection matrix); thus, all words get projected into the same position (their vectors are averaged).
所以projection层是单组的shared weights
并且没有表示激活函数
Note that the weight matrix between the input and the projection layer is shared for all word positions in the same way as in the NNLM
因此 hidden layer
实际上由这一组共享权重表示 - 正如您正确暗示的那样,所有输入节点都是相同的。
The projection layer maps the discrete word indices of an n-gram context to a continuous vector space.
如本文所述thesis
The projection layer is shared such that for contexts containing the same word multiple times, the same set of weights is applied to form each part of the projection vector. This organization effectively increases the amount of data available for training the projection layer weights since each word of each context training pattern individually contributes changes to the weight values.
此图显示了如何通过复制投影层权重矩阵中的列来有效组装投影层的输出的简单拓扑。
现在,隐藏层:
The hidden layer processes the output of the projection layer and is also created with a number of neurons specified in the topology configuration file.
编辑:对图表中发生的事情的解释
Each neuron in the projection layer is represented by a number of weights equal to the size of the vocabulary. The projection layer differs from the hidden and output layers by not using a non-linear activation function. Its purpose is simply to provide an efficient means of projecting the given n- gram context onto a reduced continuous vector space for subsequent processing by hidden and output layers trained to classify such vectors. Given the one-or-zero nature of the input vector elements, the output for a particular word with index i is simply the ith column of the trained matrix of projection layer weights (where each row of the matrix represents the weights of a single neuron).
我发现这里之前的答案有点过于复杂 - 投影层只是一个简单的矩阵乘法,或者在 NN 的上下文中,一个 regular/dense/linear 层,最后没有非线性激活( sigmoid/tanh/relu/etc.) 这个想法是投影(例如)100K 维离散向量到 600 维连续向量(我在这里随机选择数字,“你的里程可能各不相同”)。确切的矩阵参数是通过训练过程学习到的。
会发生什么 before/after 已经取决于模型和上下文,而不是 OP 所要求的。
(在 practice 中,你甚至不会理会矩阵乘法(因为你乘以一个 1-hot 向量,它的单词索引为 1,其他地方为 0),并且会对待训练有素的矩阵作为监视 table(即语料库中的第 6257 个单词 = 投影矩阵中的第 6257 个 row/column(取决于您如何定义它))。