张量流中的批量归一化:变量和性能
Batch normalization in tensorflow: variables and performance
我想在批量归一化层的变量上添加条件操作。具体来说,先进行浮动训练,然后在微调二次训练阶段进行量化。为此,我想在变量上添加一个 tf.cond 操作(均值和 var 的缩放、移位和 exp 移动平均值)。
我用我写的 batchnorm 层替换了 tf.layers.batch_normalization
(见下文)。
这个函数运行得很好(即我用两个函数得到了相同的指标),而且我可以向变量添加任何管道(在 batchnorm 操作之前)。 问题是性能(运行时)急剧下降(即,只需将 layers.batchnorm 替换为我自己的函数,就会有一个 x2 因子,如下所示)。
def batchnorm(self, x, name, epsilon=0.001, decay=0.99):
epsilon = tf.to_float(epsilon)
decay = tf.to_float(decay)
with tf.variable_scope(name):
shape = x.get_shape().as_list()
channels_num = shape[3]
# scale factor
gamma = tf.get_variable("gamma", shape=[channels_num], initializer=tf.constant_initializer(1.0), trainable=True)
# shift value
beta = tf.get_variable("beta", shape=[channels_num], initializer=tf.constant_initializer(0.0), trainable=True)
moving_mean = tf.get_variable("moving_mean", channels_num, initializer=tf.constant_initializer(0.0), trainable=False)
moving_var = tf.get_variable("moving_var", channels_num, initializer=tf.constant_initializer(1.0), trainable=False)
batch_mean, batch_var = tf.nn.moments(x, axes=[0, 1, 2]) # per channel
update_mean = moving_mean.assign((decay * moving_mean) + ((1. - decay) * batch_mean))
update_var = moving_var.assign((decay * moving_var) + ((1. - decay) * batch_var))
tf.add_to_collection(tf.GraphKeys.UPDATE_OPS, update_mean)
tf.add_to_collection(tf.GraphKeys.UPDATE_OPS, update_var)
bn_mean = tf.cond(self.is_training, lambda: tf.identity(batch_mean), lambda: tf.identity(moving_mean))
bn_var = tf.cond(self.is_training, lambda: tf.identity(batch_var), lambda: tf.identity(moving_var))
with tf.variable_scope(name + "_batchnorm_op"):
inv = tf.math.rsqrt(bn_var + epsilon)
inv *= gamma
output = ((x*inv) - (bn_mean*inv)) + beta
return output
对于以下任何问题,我将不胜感激:
- 关于如何提高我的解决方案的性能(减少运行时间)的任何想法?
- 是否可以在 batchnorm 操作之前将我自己的运算符添加到 layers.batchnorm 的变量管道中?
- 对于同一问题还有其他解决方案吗?
tf.nn.fused_batch_norm
已优化并成功了。
我必须创建两个子图,每个模式一个,因为 fused_batch_norm
的接口不采用条件 training/test 模式(is_training 是 bool 而不是张量,所以它的图表不是有条件的)。我在之后添加了条件(见下文)。但是,即使有两个子图,它的运行时间也大致相同 tf.layers.batch_normalization
.
这是最终解决方案(我仍然感谢任何改进意见或建议):
def batchnorm(self, x, name, epsilon=0.001, decay=0.99):
with tf.variable_scope(name):
shape = x.get_shape().as_list()
channels_num = shape[3]
# scale factor
gamma = tf.get_variable("gamma", shape=[channels_num], initializer=tf.constant_initializer(1.0), trainable=True)
# shift value
beta = tf.get_variable("beta", shape=[channels_num], initializer=tf.constant_initializer(0.0), trainable=True)
moving_mean = tf.get_variable("moving_mean", channels_num, initializer=tf.constant_initializer(0.0), trainable=False)
moving_var = tf.get_variable("moving_var", channels_num, initializer=tf.constant_initializer(1.0), trainable=False)
(output_train, batch_mean, batch_var) = tf.nn.fused_batch_norm(x,
gamma,
beta, # pylint: disable=invalid-name
mean=None,
variance=None,
epsilon=epsilon,
data_format="NHWC",
is_training=True,
name="_batchnorm_op")
(output_test, _, _) = tf.nn.fused_batch_norm(x,
gamma,
beta, # pylint: disable=invalid-name
mean=moving_mean,
variance=moving_var,
epsilon=epsilon,
data_format="NHWC",
is_training=False,
name="_batchnorm_op")
output = tf.cond(self.is_training, lambda: tf.identity(output_train), lambda: tf.identity(output_test))
update_mean = moving_mean.assign((decay * moving_mean) + ((1. - decay) * batch_mean))
update_var = moving_var.assign((decay * moving_var) + ((1. - decay) * batch_var))
tf.add_to_collection(tf.GraphKeys.UPDATE_OPS, update_mean)
tf.add_to_collection(tf.GraphKeys.UPDATE_OPS, update_var)
return output
我想在批量归一化层的变量上添加条件操作。具体来说,先进行浮动训练,然后在微调二次训练阶段进行量化。为此,我想在变量上添加一个 tf.cond 操作(均值和 var 的缩放、移位和 exp 移动平均值)。
我用我写的 batchnorm 层替换了 tf.layers.batch_normalization
(见下文)。
这个函数运行得很好(即我用两个函数得到了相同的指标),而且我可以向变量添加任何管道(在 batchnorm 操作之前)。 问题是性能(运行时)急剧下降(即,只需将 layers.batchnorm 替换为我自己的函数,就会有一个 x2 因子,如下所示)。
def batchnorm(self, x, name, epsilon=0.001, decay=0.99):
epsilon = tf.to_float(epsilon)
decay = tf.to_float(decay)
with tf.variable_scope(name):
shape = x.get_shape().as_list()
channels_num = shape[3]
# scale factor
gamma = tf.get_variable("gamma", shape=[channels_num], initializer=tf.constant_initializer(1.0), trainable=True)
# shift value
beta = tf.get_variable("beta", shape=[channels_num], initializer=tf.constant_initializer(0.0), trainable=True)
moving_mean = tf.get_variable("moving_mean", channels_num, initializer=tf.constant_initializer(0.0), trainable=False)
moving_var = tf.get_variable("moving_var", channels_num, initializer=tf.constant_initializer(1.0), trainable=False)
batch_mean, batch_var = tf.nn.moments(x, axes=[0, 1, 2]) # per channel
update_mean = moving_mean.assign((decay * moving_mean) + ((1. - decay) * batch_mean))
update_var = moving_var.assign((decay * moving_var) + ((1. - decay) * batch_var))
tf.add_to_collection(tf.GraphKeys.UPDATE_OPS, update_mean)
tf.add_to_collection(tf.GraphKeys.UPDATE_OPS, update_var)
bn_mean = tf.cond(self.is_training, lambda: tf.identity(batch_mean), lambda: tf.identity(moving_mean))
bn_var = tf.cond(self.is_training, lambda: tf.identity(batch_var), lambda: tf.identity(moving_var))
with tf.variable_scope(name + "_batchnorm_op"):
inv = tf.math.rsqrt(bn_var + epsilon)
inv *= gamma
output = ((x*inv) - (bn_mean*inv)) + beta
return output
对于以下任何问题,我将不胜感激:
- 关于如何提高我的解决方案的性能(减少运行时间)的任何想法?
- 是否可以在 batchnorm 操作之前将我自己的运算符添加到 layers.batchnorm 的变量管道中?
- 对于同一问题还有其他解决方案吗?
tf.nn.fused_batch_norm
已优化并成功了。
我必须创建两个子图,每个模式一个,因为 fused_batch_norm
的接口不采用条件 training/test 模式(is_training 是 bool 而不是张量,所以它的图表不是有条件的)。我在之后添加了条件(见下文)。但是,即使有两个子图,它的运行时间也大致相同 tf.layers.batch_normalization
.
这是最终解决方案(我仍然感谢任何改进意见或建议):
def batchnorm(self, x, name, epsilon=0.001, decay=0.99):
with tf.variable_scope(name):
shape = x.get_shape().as_list()
channels_num = shape[3]
# scale factor
gamma = tf.get_variable("gamma", shape=[channels_num], initializer=tf.constant_initializer(1.0), trainable=True)
# shift value
beta = tf.get_variable("beta", shape=[channels_num], initializer=tf.constant_initializer(0.0), trainable=True)
moving_mean = tf.get_variable("moving_mean", channels_num, initializer=tf.constant_initializer(0.0), trainable=False)
moving_var = tf.get_variable("moving_var", channels_num, initializer=tf.constant_initializer(1.0), trainable=False)
(output_train, batch_mean, batch_var) = tf.nn.fused_batch_norm(x,
gamma,
beta, # pylint: disable=invalid-name
mean=None,
variance=None,
epsilon=epsilon,
data_format="NHWC",
is_training=True,
name="_batchnorm_op")
(output_test, _, _) = tf.nn.fused_batch_norm(x,
gamma,
beta, # pylint: disable=invalid-name
mean=moving_mean,
variance=moving_var,
epsilon=epsilon,
data_format="NHWC",
is_training=False,
name="_batchnorm_op")
output = tf.cond(self.is_training, lambda: tf.identity(output_train), lambda: tf.identity(output_test))
update_mean = moving_mean.assign((decay * moving_mean) + ((1. - decay) * batch_mean))
update_var = moving_var.assign((decay * moving_var) + ((1. - decay) * batch_var))
tf.add_to_collection(tf.GraphKeys.UPDATE_OPS, update_mean)
tf.add_to_collection(tf.GraphKeys.UPDATE_OPS, update_var)
return output