为什么 batch_normalization 在 training = True 时产生全零输出,而在 training = False 时产生非零输出?

Why do batch_normalization produce all-zero output when training = True but produce non-zero output when training = False?

我正在学习 Tensorflow 教程 https://www.tensorflow.org/guide/migrate。这是一个例子:

def model(x, training, scope='model'):
  with v1.variable_scope(scope, reuse=v1.AUTO_REUSE):
    x = v1.layers.conv2d(x, 32, 3, activation=v1.nn.relu,
          kernel_regularizer=lambda x:0.004*tf.reduce_mean(x**2))
    x = v1.layers.max_pooling2d(x, (2, 2), 1)
    x = v1.layers.flatten(x)
    x = v1.layers.dropout(x, 0.1, training=training)
    x = v1.layers.dense(x, 64, activation=v1.nn.relu)
    x = v1.layers.batch_normalization(x, training=training)
    x = v1.layers.dense(x, 10)
    return x
train_data = tf.ones(shape=(1, 28, 28, 1))
test_data = tf.ones(shape=(1, 28, 28, 1))
train_out = model(train_data, training=True)
test_out = model(test_data, training=False)

training=True 所在的train_out

tf.Tensor([[0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]], shape=(1, 10), dtype=float32)

虽然 test_out with training=False 是随机的非零向量

[[ 0.379358   -0.55901194  0.48704922  0.11619566  0.23902717  0.01691487
   0.07227738  0.14556988  0.2459927   0.2501198 ]], shape=(1, 10), dtype=float32)


Why do batch_normalization produce all-zero output when training = True

这是因为你这里的batch size = 1。


当batch size为1并且flatten后,每个通道只有一个单值,所以batch mean(对于那个通道)本身就是单值,从而在batch后输出一个零张量归一化层。

but produce non-zero output when training = False?

在推理过程中,批归一化层使用批均值和 SD 的移动平均值而不是当前批均值和 SD 对输入进行归一化。


结论:使用批量大小 > 1 并输入具有随机 values/realistic 数据值的张量,而不是 tf.ones(),其中所有元素都相同。