梯度下降：增量值应该是标量还是向量？

Question

在运行反向传播后计算神经网络的增量值时：

delta(1) 的值将是一个标量值，它应该是一个向量？

更新：

取自http://www.holehouse.org/mlclass/09_Neural_Networks_Learning.html

具体来说：

Answer 1

首先，你可能明白，在每一层中，我们有 n x m 个需要学习的参数（或权重），因此它形成了一个二维矩阵。

n is the number of nodes in the current layer plus 1 (for bias)
m is the number of nodes in the previous layer.

我们有 n x m 个参数，因为前一层和当前层之间的任何两个节点之间都有一个连接。

我很确定 L 层的 Delta（大 delta）用于为 L 层的每个参数累加偏导数项。所以你在每一层都有一个 Delta 的二维矩阵。更新矩阵的第i行（当前层第i个节点）第j列（上一层第j个节点），

D_(i,j) = D_(i,j) + a_j * delta_i
note a_j is the activation from the j-th node in previous layer,
     delta_i is the error of the i-th node of the current layer
so we accumulate the error proportional to their activation weight.

因此要回答你的问题，Delta 应该是一个矩阵。

梯度下降：增量值应该是标量还是向量？

Gradient descent : should delta value be scalar or vector?

machine-learning

neural-network

gradient-descent