反向传播激活导数

Backpropogation activation derivative

我已经按照本视频中的说明实施了反向传播。 https://class.coursera.org/ml-005/lecture/51

这似乎成功了，通过了梯度检查并允许我在 MNIST 数字上进行训练。

然而，我注意到大多数其他关于反向传播的解释将输出增量计算为

d = (a - y) * f'(z) http://ufldl.stanford.edu/wiki/index.php/Backpropagation_Algorithm

在视频使用的同时。

d = (a - y).

当我将 delta 乘以激活导数（sigmoid 导数）时，我不再得到与梯度检查相同的梯度（至少相差一个数量级）。

是什么让 Andrew Ng（视频）省略了输出增量的激活导数？为什么它有效？然而，在添加导数时，计算出的梯度不正确？

编辑

我现在已经在输出上使用线性和 sigmoid 激活函数进行了测试，只有当我对这两种情况都使用 Ng 的 delta 方程（没有 sigmoid 导数）时，梯度检查才会通过。

使用神经网络时，您需要如何设计网络取决于学习任务。回归任务的一种常见方法是对输入和所有隐藏层使用 tanh() 激活函数，然后输出层使用线性激活函数（img 取自 here）

我没有找到来源，但有一个定理指出，将非线性和线性激活函数结合使用可以让您更好地逼近目标函数。可以找到使用不同激活函数的示例 here and here.

可以使用许多不同种类的激活函数（img 取自here）。如果你看导数，你会发现线性函数的导数等于1，这就不再提了。 Ng 的解释也是如此，如果你看视频中的第 12 分钟，你会发现他在谈论输出层。

关于反向传播算法

"When neuron is located in the output layer of the network, it is supplied with a desired response of its own. We may use e(n) = d(n) - y(n) to compute the error signal e(n) associated with this neuron; see Fig. 4.3. Having determined e(n), we find it a straightforward matter to compute the local gradient [...] When neuron is located in a hidden layer of the network, there is no specified desired response for that neuron. Accordingly, the error signal for a hidden neuron would have to be determined recursively and working backwards in terms of the error signals of all the neurons to which that hidden neuron is directly connected"

海金、西蒙 S. 等人。神经网络和学习机。卷。 3. Upper Saddle River：Pearson Education，2009。第 159-164 页

找到我的答案here。输出增量确实需要乘以激活的导数，如.

d = (a - y) * g'(z)

但是，Ng 正在利用交叉熵成本函数，该函数会产生一个抵消 g'(z) 的增量，从而导致视频中显示的 d = a - y 计算。如果改用均方误差成本函数，则必须存在激活函数的导数。

这里 link 解释了反向传播背后的所有直觉和数学。

Andrew Ng 使用的交叉熵成本函数定义为： $J(\theta) = \sum_i^m\sum_k^K\left[y\log(h_{\theta}(x))+\left(1-y\right) \left( 1- \log(h_{\theta}(x))\right)\right]$

在最后一层计算关于θ参数的偏导数时，我们得到的是：

$\left(y\log(h_{\theta}(x))+\left(1-y\right) \left( 1- \log(h_{\theta}(x))\right)\right)'$

$=\left(y\left( \log(h_{\theta}(x))\right)\right)'+\left(\left(1-y\right) \left( 1- \log(h_{\theta}(x))\right)\right)'$

$=\frac{y}{h_{\theta}(x)} (h_{\theta}(x))'+\frac{1-y}{1-h_{\theta}(x)} (1-h_{\theta}(x))'$

$=\frac{y}{h_{\theta}(x)} (\sigma(z^{(L)}))' (z^{(L)})'+\frac{1-y}{1-h_{\theta}(x)} (-(\sigma(z^{(L)}))'(z^{(L)})')$

$=\left(\frac{y}{h_{\theta}(x)}-\frac{(1-y)}{1-h_{\theta}(x)} \right)(\sigma(z^{(L)}))'(z^{(L)})'$

请参阅此 post 末尾的 σ(z) 的导数，它被替换为：

$=\left(\frac{y}{h_{\theta}(x)}-\frac{(1-y)}{1-h_{\theta}(x)} \right)\sigma(z^{(L)}) (1-\sigma(z^{(L)}))(z^{(L)})'$

最后一层 "L" 我们有 $a^{(L)} = \sigma(z^{(L)}) = h_\theta(x);$

$=\left(\frac{y}{h_{\theta}(x)}-\frac{(1-y)}{1-h_{\theta}(x)} \right)h_\theta(x)(1-h_\theta(x)) (z^{(4)})'$

如果我们相乘：

$=\left(y(1-h_\theta(x))-(1-y)h_\theta(x) \right)(z^{(4)})'$

$=\left(y-h_\theta(x) \right)(z^{(4)})'$

对于 σ(z) 的偏导数，我们得到的是：

$\sigma'(z) = \left(\frac{1}{1+e^{(-z)}}\right)' = \frac{1'e^{(-z)}-1(e^{(-z)})'}{(1+e^{(-z)})^{2}} = \frac{e^{(-z)}}{(1+e^{(-z)})^{2}}$

$=\frac{e^{(-z)}+1-1}{(1+e^{(-z)})^{2}}=\frac{1}{1+e^{(-z)}} - \left(\frac{1}{1+e^{(-z)}}\right)^{2} = \sigma(z) (1-\sigma(z))$

反向传播激活导数

Backpropogation activation derivative

activation

derivative

backpropagation

delta