"trainable" 和 "training" 标志在 tf.layers.batch_normalization 中的重要性
significance of "trainable" and "training" flag in tf.layers.batch_normalization
"trainable" 和 "training" 标志在 tf.layers.batch_normalization 中的意义是什么?这两者在训练和预测过程中有何不同?
training
控制是使用训练模式 batchnorm(使用来自该 minibatch 的统计数据)还是推理模式 batchnorm(使用训练数据的平均统计数据)。 trainable
控制在 batchnorm 过程中创建的变量本身是否可训练。
batch norm有两个阶段:
1. Training:
- Normalize layer activations using `moving_avg`, `moving_var`, `beta` and `gamma`
(`training`* should be `True`.)
- update the `moving_avg` and `moving_var` statistics.
(`trainable` should be `True`)
2. Inference:
- Normalize layer activations using `beta` and `gamma`.
(`training` should be `False`)
用于说明几种情况的示例代码:
#random image
img = np.random.randint(0,10,(2,2,4)).astype(np.float32)
# batch norm params initialized
beta = np.ones((4)).astype(np.float32)*1 # all ones
gamma = np.ones((4)).astype(np.float32)*2 # all twos
moving_mean = np.zeros((4)).astype(np.float32) # all zeros
moving_var = np.ones((4)).astype(np.float32) # all ones
#Placeholders for input image
_input = tf.placeholder(tf.float32, shape=(1,2,2,4), name='input')
#batch Norm
out = tf.layers.batch_normalization(
_input,
beta_initializer=tf.constant_initializer(beta),
gamma_initializer=tf.constant_initializer(gamma),
moving_mean_initializer=tf.constant_initializer(moving_mean),
moving_variance_initializer=tf.constant_initializer(moving_var),
training=False, trainable=False)
update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
init_op = tf.global_variables_initializer()
## 2. Run the graph in a session
with tf.Session() as sess:
# init the variables
sess.run(init_op)
for i in range(2):
ops, o = sess.run([update_ops, out], feed_dict={_input: np.expand_dims(img, 0)})
print('beta', sess.run('batch_normalization/beta:0'))
print('gamma', sess.run('batch_normalization/gamma:0'))
print('moving_avg',sess.run('batch_normalization/moving_mean:0'))
print('moving_variance',sess.run('batch_normalization/moving_variance:0'))
print('out', np.round(o))
print('')
当training=False
和trainable=False
时:
img = [[[4., 5., 9., 0.]...
out = [[ 9. 11. 19. 1.]...
The activation is scaled/shifted using gamma and beta.
当training=True
和trainable=False
时:
out = [[ 2. 2. 3. -1.] ...
The activation is normalized using `moving_avg`, `moving_var`, `gamma` and `beta`.
The averages are not updated.
当traning=True
和trainable=True
时:
The out is same as above, but the `moving_avg` and `moving_var` gets updated to new values.
moving_avg [0.03249997 0.03499997 0.06499994 0.02749997]
moving_variance [1.0791667 1.1266665 1.0999999 1.0925]
这很复杂。
在 TF 2.0 中,行为发生了变化,请参见:
About setting layer.trainable = False
on a BatchNormalization
layer:
The meaning of setting layer.trainable = False
is to freeze the
layer, i.e. its internal state will not change during training:
its trainable weights will not be updated during fit()
or
train_on_batch()
, and its state updates will not be run. Usually,
this does not necessarily mean that the layer is run in inference
mode (which is normally controlled by the training
argument that can
be passed when calling a layer). "Frozen state" and "inference mode"
are two separate concepts.
However, in the case of the BatchNormalization
layer, setting
trainable = False
on the layer means that the layer will be
subsequently run in inference mode (meaning that it will use the
moving mean and the moving variance to normalize the current batch,
rather than using the mean and variance of the current batch). This
behavior has been introduced in TensorFlow 2.0, in order to enable
layer.trainable = False
to produce the most commonly expected
behavior in the convnet fine-tuning use case. Note that:
- This behavior only occurs as of TensorFlow 2.0. In 1.*, setting
layer.trainable = False
would freeze the layer but would not
switch it to inference mode.
- Setting
trainable
on an model containing other layers will recursively set the trainable
value of all inner layers.
- If the value of the
trainable
attribute is changed after calling compile()
on a model, the new value doesn't take effect for this
model until compile()
is called again.
"trainable" 和 "training" 标志在 tf.layers.batch_normalization 中的意义是什么?这两者在训练和预测过程中有何不同?
training
控制是使用训练模式 batchnorm(使用来自该 minibatch 的统计数据)还是推理模式 batchnorm(使用训练数据的平均统计数据)。 trainable
控制在 batchnorm 过程中创建的变量本身是否可训练。
batch norm有两个阶段:
1. Training:
- Normalize layer activations using `moving_avg`, `moving_var`, `beta` and `gamma`
(`training`* should be `True`.)
- update the `moving_avg` and `moving_var` statistics.
(`trainable` should be `True`)
2. Inference:
- Normalize layer activations using `beta` and `gamma`.
(`training` should be `False`)
用于说明几种情况的示例代码:
#random image
img = np.random.randint(0,10,(2,2,4)).astype(np.float32)
# batch norm params initialized
beta = np.ones((4)).astype(np.float32)*1 # all ones
gamma = np.ones((4)).astype(np.float32)*2 # all twos
moving_mean = np.zeros((4)).astype(np.float32) # all zeros
moving_var = np.ones((4)).astype(np.float32) # all ones
#Placeholders for input image
_input = tf.placeholder(tf.float32, shape=(1,2,2,4), name='input')
#batch Norm
out = tf.layers.batch_normalization(
_input,
beta_initializer=tf.constant_initializer(beta),
gamma_initializer=tf.constant_initializer(gamma),
moving_mean_initializer=tf.constant_initializer(moving_mean),
moving_variance_initializer=tf.constant_initializer(moving_var),
training=False, trainable=False)
update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
init_op = tf.global_variables_initializer()
## 2. Run the graph in a session
with tf.Session() as sess:
# init the variables
sess.run(init_op)
for i in range(2):
ops, o = sess.run([update_ops, out], feed_dict={_input: np.expand_dims(img, 0)})
print('beta', sess.run('batch_normalization/beta:0'))
print('gamma', sess.run('batch_normalization/gamma:0'))
print('moving_avg',sess.run('batch_normalization/moving_mean:0'))
print('moving_variance',sess.run('batch_normalization/moving_variance:0'))
print('out', np.round(o))
print('')
当training=False
和trainable=False
时:
img = [[[4., 5., 9., 0.]...
out = [[ 9. 11. 19. 1.]...
The activation is scaled/shifted using gamma and beta.
当training=True
和trainable=False
时:
out = [[ 2. 2. 3. -1.] ...
The activation is normalized using `moving_avg`, `moving_var`, `gamma` and `beta`.
The averages are not updated.
当traning=True
和trainable=True
时:
The out is same as above, but the `moving_avg` and `moving_var` gets updated to new values.
moving_avg [0.03249997 0.03499997 0.06499994 0.02749997]
moving_variance [1.0791667 1.1266665 1.0999999 1.0925]
这很复杂。 在 TF 2.0 中,行为发生了变化,请参见:
About setting
layer.trainable = False
on aBatchNormalization
layer:The meaning of setting
layer.trainable = False
is to freeze the layer, i.e. its internal state will not change during training:
its trainable weights will not be updated duringfit()
ortrain_on_batch()
, and its state updates will not be run. Usually, this does not necessarily mean that the layer is run in inference
mode (which is normally controlled by thetraining
argument that can be passed when calling a layer). "Frozen state" and "inference mode"
are two separate concepts.However, in the case of the
BatchNormalization
layer, setting
trainable = False
on the layer means that the layer will be
subsequently run in inference mode (meaning that it will use the moving mean and the moving variance to normalize the current batch,
rather than using the mean and variance of the current batch). This behavior has been introduced in TensorFlow 2.0, in order to enablelayer.trainable = False
to produce the most commonly expected behavior in the convnet fine-tuning use case. Note that:
- This behavior only occurs as of TensorFlow 2.0. In 1.*, setting
layer.trainable = False
would freeze the layer but would not switch it to inference mode.- Setting
trainable
on an model containing other layers will recursively set thetrainable
value of all inner layers.- If the value of the
trainable
attribute is changed after callingcompile()
on a model, the new value doesn't take effect for this model untilcompile()
is called again.