使用 tensorflow 实现批量归一化
Implementing batch normalization with tensorflow
我正在尝试在 tensor-flow
中实现批归一化层。我没有问题 运行 使用 tf.moments
获得 mean 和 variance 的训练步骤。
对于测试时间,我想设置一个指数移动平均线来跟踪均值和方差。我正在尝试这样做:
def batch_normalized_linear_layer(state_below, scope_name, n_inputs, n_outputs, stddev, wd, eps=.0001):
with tf.variable_scope(scope_name) as scope:
weight = _variable_with_weight_decay(
"weights", shape=[n_inputs, n_outputs],
stddev=stddev, wd=wd
)
act = tf.matmul(state_below, weight)
# get moments
act_mean, act_variance = tf.nn.moments(act, [0])
# get mean and variance variables
mean = _variable_on_cpu('bn_mean', [n_outputs], tf.constant_initializer(0.0))
variance = _variable_on_cpu('bn_variance', [n_outputs], tf.constant_initializer(1.0))
# assign the moments
assign_mean = mean.assign(act_mean)
assign_variance = variance.assign(act_variance)
act_bn = tf.mul((act - mean), tf.rsqrt(variance + eps), name=scope.name+"_bn")
beta = _variable_on_cpu("beta", [n_outputs], tf.constant_initializer(0.0))
gamma = _variable_on_cpu("gamma", [n_outputs], tf.constant_initializer(1.0))
bn = tf.add(tf.mul(act_bn, gamma), beta)
output = tf.nn.relu(bn, name=scope.name)
_activation_summary(output)
return output, mean, variance
其中 _variable_on_cpu 定义为:
def _variable_on_cpu(name, shape, initializer):
"""Helper to create a Variable stored on CPU memory.
Args:
name: name of the variable
shape: list of ints
initializer: initializer for Variable
Returns:
Variable Tensor
"""
with tf.device('/cpu:0'):
var = tf.get_variable(name, shape, initializer=initializer)
return var
我相信我在设置
assign_mean = mean.assign(act_mean)
assign_variance = variance.assign(act_variance)
不正确,但我不确定如何。当我使用 tensorboard 跟踪这些均值和方差变量时,它们与它们的初始化值持平。
Rafal 的评论直指问题的核心:您没有运行分配节点。您可以尝试使用我在另一个答案中发布的 batchnorm 助手 - - 或者您可以按照他的建议通过添加 with_dependencies 来强制进行分配。
一般原则是,如果数据或控制依赖项流向 "through" 节点,您应该只指望它是 运行。 with_dependencies
确保在使用输出操作之前,指定的依赖项已经完成。
我正在尝试在 tensor-flow
中实现批归一化层。我没有问题 运行 使用 tf.moments
获得 mean 和 variance 的训练步骤。
对于测试时间,我想设置一个指数移动平均线来跟踪均值和方差。我正在尝试这样做:
def batch_normalized_linear_layer(state_below, scope_name, n_inputs, n_outputs, stddev, wd, eps=.0001):
with tf.variable_scope(scope_name) as scope:
weight = _variable_with_weight_decay(
"weights", shape=[n_inputs, n_outputs],
stddev=stddev, wd=wd
)
act = tf.matmul(state_below, weight)
# get moments
act_mean, act_variance = tf.nn.moments(act, [0])
# get mean and variance variables
mean = _variable_on_cpu('bn_mean', [n_outputs], tf.constant_initializer(0.0))
variance = _variable_on_cpu('bn_variance', [n_outputs], tf.constant_initializer(1.0))
# assign the moments
assign_mean = mean.assign(act_mean)
assign_variance = variance.assign(act_variance)
act_bn = tf.mul((act - mean), tf.rsqrt(variance + eps), name=scope.name+"_bn")
beta = _variable_on_cpu("beta", [n_outputs], tf.constant_initializer(0.0))
gamma = _variable_on_cpu("gamma", [n_outputs], tf.constant_initializer(1.0))
bn = tf.add(tf.mul(act_bn, gamma), beta)
output = tf.nn.relu(bn, name=scope.name)
_activation_summary(output)
return output, mean, variance
其中 _variable_on_cpu 定义为:
def _variable_on_cpu(name, shape, initializer):
"""Helper to create a Variable stored on CPU memory.
Args:
name: name of the variable
shape: list of ints
initializer: initializer for Variable
Returns:
Variable Tensor
"""
with tf.device('/cpu:0'):
var = tf.get_variable(name, shape, initializer=initializer)
return var
我相信我在设置
assign_mean = mean.assign(act_mean)
assign_variance = variance.assign(act_variance)
不正确,但我不确定如何。当我使用 tensorboard 跟踪这些均值和方差变量时,它们与它们的初始化值持平。
Rafal 的评论直指问题的核心:您没有运行分配节点。您可以尝试使用我在另一个答案中发布的 batchnorm 助手 -
一般原则是,如果数据或控制依赖项流向 "through" 节点,您应该只指望它是 运行。 with_dependencies
确保在使用输出操作之前,指定的依赖项已经完成。