具有密集连接层的 Dropout

Question

我正在为我的一个项目使用 densenet 模型，但在使用正则化时遇到一些困难。

在没有任何正则化的情况下，验证和训练损失 (MSE) 都会减少。但是，训练损失下降得更快，导致最终模型出现一些过度拟合。

所以我决定使用dropout来避免过拟合。当使用 Dropout 时，验证和训练损失在第一个 epoch 期间减少到大约 0.13，并在大约 10 个 epoch 内保持不变。

之后，两个损失函数都以与没有dropout相同的方式下降，导致再次过拟合。最终的loss值和没有dropout的范围差不多。

所以对我来说，dropout 似乎并没有真正起作用。

不过，如果我切换到 L2 正则化，我可以避免过度拟合，但我宁愿使用 Dropout 作为正则化器。

现在我想知道是否有人经历过这种行为？

我在密集块（瓶颈层）和过渡块（dropout rate = 0.5）中都使用了 dropout：

def bottleneck_layer(self, x, scope):
    with tf.name_scope(scope):
        x = Batch_Normalization(x, training=self.training, scope=scope+'_batch1')
        x = Relu(x)
        x = conv_layer(x, filter=4 * self.filters, kernel=[1,1], layer_name=scope+'_conv1')
        x = Drop_out(x, rate=dropout_rate, training=self.training)

        x = Batch_Normalization(x, training=self.training, scope=scope+'_batch2')
        x = Relu(x)
        x = conv_layer(x, filter=self.filters, kernel=[3,3], layer_name=scope+'_conv2')
        x = Drop_out(x, rate=dropout_rate, training=self.training)

        return x

def transition_layer(self, x, scope):
    with tf.name_scope(scope):
        x = Batch_Normalization(x, training=self.training, scope=scope+'_batch1')
        x = Relu(x)
        x = conv_layer(x, filter=self.filters, kernel=[1,1], layer_name=scope+'_conv1')
        x = Drop_out(x, rate=dropout_rate, training=self.training)
        x = Average_pooling(x, pool_size=[2,2], stride=2)

        return x

Answer 1

Without any regularization, both validation and training loss (MSE) decrease. The training loss drops faster though, resulting in some overfitting of the final model.

这不是过拟合。

当你的验证损失开始增加，而你的训练损失继续减少时，过度拟合就开始了；这是它的标志性签名：

图像改编自 Wikipedia entry on overfitting - 不同的东西可能位于水平轴上，例如提升树的深度或数量、神经网络拟合迭代次数等

训练损失和验证损失之间的（通常预期的）差异是完全不同的，称为 generalization gap:

An important concept for understanding generalization is the generalization gap, i.e., the difference between a model’s performance on training data and its performance on unseen data drawn from the same distribution.

实际上，验证数据确实是看不见的数据。

So for me it seems like dropout is not really working.

很可能是这种情况 - dropout 总是可以解决每个问题。

Answer 2

有趣的问题，
我建议绘制验证损失和训练损失，看看它是否真的过度拟合。如果您发现验证损失没有变化而训练损失下降（您也可能会看到它们之间存在很大差距）那么它就是过度拟合。

如果过度拟合，那么尝试减少层数或节点数（在你这样做之后也可以稍微调整一下 Dropout rate）。减少 epoch 的数量也有帮助。

如果您想使用其他方法而不是丢弃，我建议您使用高斯噪声层。
凯拉斯 - https://keras.io/layers/noise/
TensorFlow - https://www.tensorflow.org/api_docs/python/tf/keras/layers/GaussianNoise

具有密集连接层的 Dropout

Dropout with densely connected layer

machine-learning

deep-learning

keras

tensorflow

densenet