如何使用TensorBoard分析结果并降低均方误差

Question

在 Tensorflow 中，我正在尝试构建一个模型来执行图像超分辨率（即回归任务）并使用 TensorBoard 分析结果。在训练期间，我发现均方误差 (MSE) 大部分时间（甚至从一开始）在 100 到 200 之间反弹，并且从未收敛。我希望将以下变量添加到 tf.summary 并分析导致此问题的原因。

graph_loss = get_graph_mean_square_error()
tf.summary.scalar('graph_loss', graph_loss)

regularization_loss = tf.add_n([tf.nn.l2_loss(weight) for weight in weights]) * regularization_param
tf.summary.scalar('reg_loss', regularization_loss)

tf.summary.scalar('overall_loss', regularization_loss + graph_loss)

for index in range(len(weights)):
    tf.summary.histogram("weight[%02d]" % index, weights[index])

optimizer = tf.train.AdamOptimizer()
capped_grad_and_vars = [(tf.clip_by_value(grad, -clip_value, clip_value), var) for grad, var in grad_and_vars if grad is not None]
train_optimizer = optimizer.apply_gradients(capped_grad_and_vars, global_step)

for grad, var in grad_and_vars:
    tf.summary.histogram(var.name + '/gradient', grad)

for grad, var in capped_grad_and_vars:
    tf.summary.histogram(var.name + '/capped_gradient', grad)

该模型是一个带有跳过连接的 ResNET，其中包含多个 [卷积 -> 批量归一化 -> ReLU] 重复层。在 Distributions 选项卡中，我可以看到有几个图表添加了以下模式：

BatchNorm_[number]/beta0/capped_gradient
BatchNorm_[number]/beta0/gradient
BatchNorm_[number]/gamma0/capped_gradient
BatchNorm_[number]/gamma0/gradient
偏差[number]_0/capped_gradient
偏差[number]_0/梯度
体重_[number]_
体重_[number]_0/capped_gradient
权重_[number]_0/梯度

我看到的东西很少，希望有人能对它们有所了解：

使用 L2 损失进行正则化

regularization_param 的值设置为 0.0001，reg_loss 图表显示它从 1.5 增加（如对数）并收敛于 3.5 左右。在我的例子中，graph_loss 在 100 到 200 之间，而 reg_loss 在 1.5 到 3.5 之间。

我们要找的reg_loss图的趋势是不是（像对数递增函数）？
reg_loss 会不会太小而不能惩罚模型（100-200 对 1.5-3.5）？
我怎么知道我选择的 regularization_param 是否正确？

解决梯度消失问题

我在想MSE从头到尾的弹跳问题可能是由于梯度消失问题。我希望使用多种技术，例如带有跳过连接的 ResNET、批量归一化和梯度裁剪（clip_by_value 在 0.05）来解决梯度消失问题。我不太确定如何阅读图表，但在我看来，在前 20K 个步骤中，前 22 层的权重似乎没有改变（我不熟悉 TensorBoard，如果我 read/interpret 错误）：

我已将训练分成几个运行并恢复之前运行的检查点。这是最后几层 66K 步后的图表：

你可以看到，在前几个 20K 步中，权重在某些层上仍然发生变化，例如橙色的 weight_36_ 和 weight_37_。然而，在 50K 步之后，所有的权重看起来像绿色的 weight_36_（非常薄）和 weight_39_（厚度很小）一样平坦。

然后我查看批量归一化图（注意 capped_gradient 是 clip_by_value 在 0.05），看起来有如下一些变化：

拜托，有人可以解释一下上图是否正确吗？（我不明白为什么每次批归一化后都有一些好的值，但权重似乎没有变化）
从头到尾解决MSE跳动问题应该从哪个方向看？

欢迎提出任何其他建议:)

Answer 1

要尝试的事情：

移除渐变剪裁： 您将渐变值剪裁为 0.05。我认为 update = (0.05 * learning rate) 产生非常低的权重更新，这就是为什么大多数层没有学习任何东西的原因。如果将最后一层（第一个来自输出）的梯度剪裁为 0.05，那么非常低的梯度值会传播回其前一层，并与局部梯度相乘会产生更低的梯度值。因此，您可能会看到最后几层学到了一些东西。
移除l2正则化：尝试移除正则化，移除正则化解决了弹跳MSE问题，那么你应该非常仔细地调整正则化参数。

Answer 2

Is the trend of reg_loss graph we are looking for (like logarithmically > increasing function)?

是的，看起来不错。

Would the reg_loss too small to penalize the model (100-200 vs 1.5-3.5)?

How do I know if I choose regularization_param correctly?

首先，我建议您将学习率从 0.001 变为 0.1（这是研究梯度裁剪问题的第一件事），并观察平均 MSE 是否降低以选择没有 reg_loss。然后你可以通过微调 reg_loss.

添加正则化

Please, can someone explain if the above graph looks correct? (I do not understand why after each batch normalization there are some good values but the weights do not seem to change)

Which direction should I look at to address the MSE bouncing problem from the beginning to the end?

请仔细检查您是否对每个时期取 平均值 MSE。有时在每个 sub-epoch 中观察到弹跳问题可能是正常的。但是如果你对每个 epoch 取平均 MSE，你可能会观察到它会逐渐下降。

如何使用TensorBoard分析结果并降低均方误差

How to use TensorBoard to analyze the results and reduce the mean square error

machine-learning

tensorflow

tensorboard