为什么我们要在使用 dropout 时缩放输出?

Why do we want to scale outputs when using dropout?

来自dropout paper

"The idea is to use a single neural net at test time without dropout. The weights of this network are scaled-down versions of the trained weights. If a unit is retained with probability p during training, the outgoing weights of that unit are multiplied by p at test time as shown in Figure 2. This ensures that for any hidden unit the expected output (under the distribution used to drop units at training time) is the same as the actual output at test time."

为什么我们要保留预期的输出?如果我们使用 ReLU 激活,权重或激活的线性缩放会导致网络输出的线性缩放,并且不会对分类精度产生任何影响。

我错过了什么?

准确的说,我们要保留的不是"expected output",而是输出的期望值,也就是说,我们要弥补训练中的差异(当我们不传递值一些节点)和通过保留输出的平均(预期)值来测试阶段。

在 ReLU 激活的情况下,这种缩放确实会导致输出的线性缩放(当它们为正时),但为什么您认为它不会影响分类模型的最终准确性?至少在最后,我们通常应用 softmax 或 sigmoid,它们是非线性的并且取决于这种缩放。