批量大小=1 时的批量归一化

Question

当我使用批量归一化但设置 batch_size = 1 时会发生什么？

因为我使用的是3D医学图像作为训练数据集，由于GPU的限制，batch size只能设置为1。通常，我知道，当batch_size = 1时，方差将为0。而(x-mean)/variance会因为除以0而导致错误。

但是为什么我设置batch_size = 1时没有出现错误？为什么我的网络训练得和我预期的一样好？谁能解释一下？

认为：

The ZeroDivisionError may not be encountered because of two cases. First, the exception is caught in a try catch block. Second, a small rational number is added ( 1e-19 ) to the variance term so that it is never zero.

但是some people不同意。他们说：

You should calculate mean and std across all pixels in the images of the batch. (So even batch_size = 1, there are still a lot of pixels in the batch. So the reason why batch_size=1 can still work is not because of 1e-19)

我查看了Pytorch源码，从代码来看我认为后一种是对的。

大家有不同意见吗？？？

Answer 1

variance will be 0

不，不会； BatchNormalization 仅计算关于单轴的统计数据（通常是通道轴，默认情况下 =-1 （最后一个））；每隔一个轴 collapsed，即求和求平均值；详情如下。

然而，更重要的是，除非你能明确证明它的合理性，否则我建议不要将 BatchNormalization 与 batch_size=1 一起使用；有很强的理论理由反对它，并且多个出版物表明 BN 性能在 batch_size 低于 32 时下降，并且在 <=8 时严重下降。简而言之，单个样本的“平均”批处理统计数据在样本之间差异很大（高方差），并且 BN 机制无法按预期工作。

小批量替代品：Batch Renormalization -- Layer Normalization -- Weight Normalization

实施细节：来自source code：

reduction_axes = list(range(len(input_shape)))
del reduction_axes[self.axis]

最终，tf.nn.monents is called with axes=reduction_axes, which performs a reduce_sum to compute variance. Then, in the TensorFlow backend, mean and variance are passed to tf.nn.batch_normalization 到 return 训练或推理规范化输入。

也就是说，如果你的输入是(batch_size, height, width, depth, channels)，或者(1, height, width, depth, channels)，那么BN会运行计算过1，height，width，和 depth 个维度。

方差可以为零吗？ - 是的，如果任何给定 channel 切片（沿每个维度）的每个数据点都相同。但这对于真实数据来说几乎是不可能的。

其他答案：第一个误导：

a small rational number is added (1e-19) to the variance

这在计算方差时不会发生，但在归一化时将添加到方差；尽管如此，它很少是必要的，因为 variance 远非零。此外，epsilon 项实际上被 Keras 默认为 1e-3；它在规范化方面发挥作用，而不仅仅是避免零除法。

更新：我未能解决一个重要的直觉，怀疑方差为0；事实上，批量统计方差为零，因为只有一个统计 - 但“统计”本身涉及通道的均值和方差+ 空间维度。换句话说，方差 of 均值和方差（of 单个训练样本）为零，但均值和方差本身不是.

Answer 2

when batch_size = 1, variance will be 0

不，因为当您计算 BN 的均值和方差时（例如使用 tf.nn.monents），您将在轴 [0, 1, 2] 上计算它（假设您有 NHWC 张量通道顺序）。

来自“Group Normalization”论文： https://arxiv.org/pdf/1803.08494.pdf

使用 batch_size=1 时，批量归一化等同于实例归一化，它在某些任务中很有用。

但是，如果您使用某种编码器-解码器，并且在某个层中您有空间大小为 1x1 的张量，这将是一个问题，因为每个通道只有一个值，并且值的均值将等于此值值，因此 BN 会将信息清零。

批量大小=1 时的批量归一化

Batch normalization when batch size=1

python

deep-learning

keras

tensorflow

batch-normalization