为什么 tensorflow 中的 batch_normalization 没有给出预期的结果?
Why batch_normalization in tensorflow does not give expected results?
我想在一个小示例中查看 batch_normalization 层的输出,但显然我做错了什么所以我得到与输入相同的输出。
import tensorflow as tf
import keras.backend as K
K.set_image_data_format('channels_last')
X = tf.placeholder(tf.float32, shape=(None, 2, 2, 3)) # samples are 2X2 images with 3 channels
outp = tf.layers.batch_normalization(inputs=X, axis=3)
x = np.random.rand(4, 2, 2, 3) # sample set: 4 images
init_op = tf.global_variables_initializer()
with tf.Session() as sess:
sess.run(init_op)
K.set_session(sess)
a = sess.run(outp, feed_dict={X:x, K.learning_phase(): 0})
print(a-x) # print the difference between input and normalized output
以上代码的输入输出几乎完全一致。任何人都可以向我指出问题吗?
请记住,batch_normalization
在训练和测试时表现不同。在这里,你从来没有"trained"你的batch normalization,所以它学习到的移动平均是随机的但接近于0,而移动方差因子接近1,所以输出几乎与输入相同。如果您使用 K.learning_phase(): 1
,您将已经看到一些差异(因为它将使用批次的平均值和标准偏差进行归一化);如果您首先学习大量示例,然后测试其他一些示例,您也会看到规范化的发生,因为学习的均值和标准差不会是 0 和 1。
为了更好地了解batch norm的效果,我还建议你将你的输入乘以一个大数字(比如100),这样你就可以清楚地看到非标准化和标准化向量之间的区别,这将有助于你测试发生了什么。
编辑: 在您的代码中,移动均值和移动方差的更新似乎从未完成。您需要确保更新操作是 运行,如 batch_normalization's doc 中所示。以下几行应该可以使它工作:
outp = tf.layers.batch_normalization(inputs=X, axis=3, training=is_training, center=False, scale=False)
update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
with tf.control_dependencies(update_ops):
outp = tf.identity(outp)
下面是我的完整工作代码(我去掉了 Keras,因为我不太了解它,但你应该可以重新添加它)。
import tensorflow as tf
import numpy as np
X = tf.placeholder(tf.float32, shape=(None, 2, 2, 3)) # samples are 2X2 images with 3 channels
is_training = tf.placeholder(tf.bool, shape=()) # samples are 2X2 images with 3 channels
outp = tf.layers.batch_normalization(inputs=X, axis=3, training=is_training, center=False, scale=False)
update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
with tf.control_dependencies(update_ops):
outp = tf.identity(outp)
x = np.random.rand(4, 2, 2, 3) * 100 # sample set: 4 images
init_op = tf.global_variables_initializer()
with tf.Session() as sess:
sess.run(init_op)
initial = sess.run(outp, feed_dict={X:x, is_training: False})
for i in range(10000):
a = sess.run(outp, feed_dict={X:x, is_training: True})
if (i % 1000 == 0):
print("Step %i: " %i, a-x) # print the difference between input and normalized output
final = sess.run(outp, feed_dict={X: x, is_training: False})
print("initial: ", initial)
print("final: ", final)
assert not np.array_equal(initial, final)
我想在一个小示例中查看 batch_normalization 层的输出,但显然我做错了什么所以我得到与输入相同的输出。
import tensorflow as tf
import keras.backend as K
K.set_image_data_format('channels_last')
X = tf.placeholder(tf.float32, shape=(None, 2, 2, 3)) # samples are 2X2 images with 3 channels
outp = tf.layers.batch_normalization(inputs=X, axis=3)
x = np.random.rand(4, 2, 2, 3) # sample set: 4 images
init_op = tf.global_variables_initializer()
with tf.Session() as sess:
sess.run(init_op)
K.set_session(sess)
a = sess.run(outp, feed_dict={X:x, K.learning_phase(): 0})
print(a-x) # print the difference between input and normalized output
以上代码的输入输出几乎完全一致。任何人都可以向我指出问题吗?
请记住,batch_normalization
在训练和测试时表现不同。在这里,你从来没有"trained"你的batch normalization,所以它学习到的移动平均是随机的但接近于0,而移动方差因子接近1,所以输出几乎与输入相同。如果您使用 K.learning_phase(): 1
,您将已经看到一些差异(因为它将使用批次的平均值和标准偏差进行归一化);如果您首先学习大量示例,然后测试其他一些示例,您也会看到规范化的发生,因为学习的均值和标准差不会是 0 和 1。
为了更好地了解batch norm的效果,我还建议你将你的输入乘以一个大数字(比如100),这样你就可以清楚地看到非标准化和标准化向量之间的区别,这将有助于你测试发生了什么。
编辑: 在您的代码中,移动均值和移动方差的更新似乎从未完成。您需要确保更新操作是 运行,如 batch_normalization's doc 中所示。以下几行应该可以使它工作:
outp = tf.layers.batch_normalization(inputs=X, axis=3, training=is_training, center=False, scale=False)
update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
with tf.control_dependencies(update_ops):
outp = tf.identity(outp)
下面是我的完整工作代码(我去掉了 Keras,因为我不太了解它,但你应该可以重新添加它)。
import tensorflow as tf
import numpy as np
X = tf.placeholder(tf.float32, shape=(None, 2, 2, 3)) # samples are 2X2 images with 3 channels
is_training = tf.placeholder(tf.bool, shape=()) # samples are 2X2 images with 3 channels
outp = tf.layers.batch_normalization(inputs=X, axis=3, training=is_training, center=False, scale=False)
update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
with tf.control_dependencies(update_ops):
outp = tf.identity(outp)
x = np.random.rand(4, 2, 2, 3) * 100 # sample set: 4 images
init_op = tf.global_variables_initializer()
with tf.Session() as sess:
sess.run(init_op)
initial = sess.run(outp, feed_dict={X:x, is_training: False})
for i in range(10000):
a = sess.run(outp, feed_dict={X:x, is_training: True})
if (i % 1000 == 0):
print("Step %i: " %i, a-x) # print the difference between input and normalized output
final = sess.run(outp, feed_dict={X: x, is_training: False})
print("initial: ", initial)
print("final: ", final)
assert not np.array_equal(initial, final)