批量归一化设置训练和测试时间

Batch normalization setup train and test time

最近看了很多关于keras batch normalization的文章,讨论了很多

根据该网站: 训练时设置“tf.layers.batch_normalization”的“training=False”会得到更好的验证结果

回答说:

If you turn on batch normalization with training = True that will start to normalize the batches within themselves and collect a moving average of the mean and variance of each batch. Now here's the tricky part. The moving average is an exponential moving average, with a default momentum of 0.99 for tf.layers.batch_normalization(). The mean starts at 0, the variance at 1 again. But since each update is applied with a weight of ( 1 - momentum ), it will asymptotically reach the actual mean and variance in infinity. For example in 100 steps it will reach about 73.4% of the real value, because 0.99100 is 0.366. If you have numerically large values, the difference can be enormous.

由于我的批量较小,这意味着需要采取更多步骤,而且训练和测试之间的差异可能很大,这会导致预测结果不佳。

所以,我必须在 call 中设置 training=False ,这再次从上面的 link 说:

When you set training = False that means the batch normalization layer will use its internally stored average of mean and variance to normalize the batch, not the batch's own mean and variance.

而且我知道在测试期间我们应该使用训练中的移动均值和移动方差 time.And 我知道 moving_mean_initializer可以设置。

keras.layers.BatchNormalization(axis=-1, momentum=0.99, epsilon=0.001, center=True, scale=True, beta_initializer='zeros', gamma_initializer='ones', moving_mean_initializer='zeros', moving_variance_initializer='ones', beta_regularizer=None, gamma_regularizer=None, beta_constraint=None, gamma_constraint=None)

不知道我的看法对不对:

(1) 测试时设置training =False,训练时training=True

(2)使用hsitory_weight = ModelCheckpoint(filepath="weights.{epoch:02d}.hdf5",save_weights_only=True,save_best_only=False)存储归一化权重(包括移动平均值和方差的当然gomma和beta)

(3) 用我们从步骤 (2) 中得到的东西初始化它

不知道上面说的有没有错,如果有错,请指正。

而且我不确定人们通常如何处理这个问题?我建议的方法有效吗?

提前致谢!

我做了一些测试,训练后,

我将所有批处理层的移动均值和移动方差设置为零。

它给出了糟糕的结果。

我相信在推理模式下,keras会使用移动均值和移动方差。

还有部分training flag,不管你设置成True还是False,两者的唯一区别就是

移动方差和移动均值是否更新。