恢复模型时使用batch norm?
Using batch norm when restore the model?
在tensorflow中恢复模型时使用batch norm有点问题。
下面是我的批量规范,来自 here:
def _batch_normalization(self, input_tensor, is_training, batch_norm_epsilon, decay=0.999):
"""batch normalization for dense nets.
Args:
input_tensor: `tensor`, the input tensor which needed normalized.
is_training: `bool`, if true than update the mean/variance using moving average,
else using the store mean/variance.
batch_norm_epsilon: `float`, param for batch normalization.
decay: `float`, param for update move average, default is 0.999.
Returns:
normalized params.
"""
# actually batch normalization is according to the channels dimension.
input_shape_channels = int(input_tensor.get_shape()[-1])
# scala and beta using in the the formula like that: scala * (x - E(x))/sqrt(var(x)) + beta
scale = tf.Variable(tf.ones([input_shape_channels]))
beta = tf.Variable(tf.zeros([input_shape_channels]))
# global mean and var are the mean and var that after moving averaged.
global_mean = tf.Variable(tf.zeros([input_shape_channels]), trainable=False)
global_var = tf.Variable(tf.ones([input_shape_channels]), trainable=False)
# if training, then update the mean and var, else using the trained mean/var directly.
if is_training:
# batch norm in the channel axis.
axis = list(range(len(input_tensor.get_shape()) - 1))
batch_mean, batch_var = tf.nn.moments(input_tensor, axes=axis)
# update the mean and var.
train_mean = tf.assign(global_mean, global_mean * decay + batch_mean * (1 - decay))
train_var = tf.assign(global_var, global_var * decay + batch_var * (1 - decay))
with tf.control_dependencies([train_mean, train_var]):
return tf.nn.batch_normalization(input_tensor,
batch_mean, batch_var, beta, scale, batch_norm_epsilon)
else:
return tf.nn.batch_normalization(input_tensor,
global_mean, global_var, beta, scale, batch_norm_epsilon)
我训练模型并使用 tf.train.Saver()
保存它。下面是测试代码:
def inference(self, images_for_predict):
"""load the pre-trained model and do the inference.
Args:
images_for_predict: `tensor`, images for predict using the pre-trained model.
Returns:
the predict labels.
"""
tf.reset_default_graph()
images, labels, _, _, prediction, accuracy, saver = self._build_graph(1, False)
predictions = []
correct = 0
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
# saver = tf.train.import_meta_graph('./models/dense_nets_model/dense_nets.ckpt.meta')
# saver.restore(sess, tf.train.latest_checkpoint('./models/dense_nets_model/'))
saver.restore(sess, './models/dense_nets_model/dense_nets.ckpt')
for i in range(100):
pred, corr = sess.run([tf.argmax(prediction, 1), accuracy],
feed_dict={
images: [images_for_predict.images[i]],
labels: [images_for_predict.labels[i]]})
correct += corr
predictions.append(pred[0])
print("PREDICTIONS:", predictions)
print("ACCURACY:", correct / 100)
但是预测结果总是很糟糕,像这样:
('PREDICTIONS:', [2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2])
('ACCURACY:', 0.080000000000000002)
一些提示:images_for_predict = mnist.test
和 self._build_graph
方法有两个参数:batch_size
和 is_training
。
谁能帮帮我?
看到你的批量规范的实现,当你加载你的模型时,你需要保持使用 images, labels, _, _, prediction, accuracy, saver = self._build_graph(1, False)
构建的图表并加载检查点的 权重值 ,但是不是元图。我认为 saver.restore(sess, './models/dense_nets_model/dense_nets.ckpt')
现在也恢复了元图(如果我错了请见谅),所以你只需要恢复它的 "data" 部分。
否则,你只是用图来训练,batch norm中使用的均值和方差是从batch中得到的。但是,当您测试批次的大小为 1 时,因此按批次的均值和方差进行归一化总是会使您的数据为 0,因此输出恒定。
无论如何,我建议改用 tf.layers.batch_normalization
,并使用 is_training
占位符,您需要将其提供给您的网络...
在尝试了很多方法后,我解决了这个问题,下面是我所做的。
首先感谢@gdelab,我改用了tf.layers.batch_normalization
,所以我的批量规范函数是这样的:
def _batch_normalization(self, input_tensor, is_training):
return tf.layers.batch_normalization(input_tensor, training=is_training)
参数 is_training
是这样的占位符:is_training = tf.placeholder(tf.bool)
构建图表时,请记住在优化中添加此代码:
extra_update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
with tf.control_dependencies(extra_update_ops):
train_step = tf.train.AdamOptimizer(self.learning_rate).minimize(cross_entropy)
因为 tf.layers.batch_normalization
添加到更新均值和方差不会自动添加为训练操作的依赖项 - 所以如果你不做任何额外的事情,它们永远不会得到 运行。
于是开始训练网络,训练完成后,使用如下代码保存模型:
saver = tf.train.Saver(var_list=tf.global_variables())
savepath = saver.save(sess, 'here_is_your_personal_model_path')
注意 var_list=tf.global_variables()
参数确保 tensorflow 保存所有参数包括设置为不可训练的全局 mean/var。
恢复和测试模型时,按如下方式进行:
# build the graph like training:
images, labels, _, _, prediction, accuracy, saver = self._build_graph(1, False)
saver = tf.train.Saver()
saver.restore(sess, 'here_is_your_personal_model_path')
现在可以测试his/her模型,希望对你有帮助,谢谢!
在tensorflow中恢复模型时使用batch norm有点问题。
下面是我的批量规范,来自 here:
def _batch_normalization(self, input_tensor, is_training, batch_norm_epsilon, decay=0.999):
"""batch normalization for dense nets.
Args:
input_tensor: `tensor`, the input tensor which needed normalized.
is_training: `bool`, if true than update the mean/variance using moving average,
else using the store mean/variance.
batch_norm_epsilon: `float`, param for batch normalization.
decay: `float`, param for update move average, default is 0.999.
Returns:
normalized params.
"""
# actually batch normalization is according to the channels dimension.
input_shape_channels = int(input_tensor.get_shape()[-1])
# scala and beta using in the the formula like that: scala * (x - E(x))/sqrt(var(x)) + beta
scale = tf.Variable(tf.ones([input_shape_channels]))
beta = tf.Variable(tf.zeros([input_shape_channels]))
# global mean and var are the mean and var that after moving averaged.
global_mean = tf.Variable(tf.zeros([input_shape_channels]), trainable=False)
global_var = tf.Variable(tf.ones([input_shape_channels]), trainable=False)
# if training, then update the mean and var, else using the trained mean/var directly.
if is_training:
# batch norm in the channel axis.
axis = list(range(len(input_tensor.get_shape()) - 1))
batch_mean, batch_var = tf.nn.moments(input_tensor, axes=axis)
# update the mean and var.
train_mean = tf.assign(global_mean, global_mean * decay + batch_mean * (1 - decay))
train_var = tf.assign(global_var, global_var * decay + batch_var * (1 - decay))
with tf.control_dependencies([train_mean, train_var]):
return tf.nn.batch_normalization(input_tensor,
batch_mean, batch_var, beta, scale, batch_norm_epsilon)
else:
return tf.nn.batch_normalization(input_tensor,
global_mean, global_var, beta, scale, batch_norm_epsilon)
我训练模型并使用 tf.train.Saver()
保存它。下面是测试代码:
def inference(self, images_for_predict):
"""load the pre-trained model and do the inference.
Args:
images_for_predict: `tensor`, images for predict using the pre-trained model.
Returns:
the predict labels.
"""
tf.reset_default_graph()
images, labels, _, _, prediction, accuracy, saver = self._build_graph(1, False)
predictions = []
correct = 0
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
# saver = tf.train.import_meta_graph('./models/dense_nets_model/dense_nets.ckpt.meta')
# saver.restore(sess, tf.train.latest_checkpoint('./models/dense_nets_model/'))
saver.restore(sess, './models/dense_nets_model/dense_nets.ckpt')
for i in range(100):
pred, corr = sess.run([tf.argmax(prediction, 1), accuracy],
feed_dict={
images: [images_for_predict.images[i]],
labels: [images_for_predict.labels[i]]})
correct += corr
predictions.append(pred[0])
print("PREDICTIONS:", predictions)
print("ACCURACY:", correct / 100)
但是预测结果总是很糟糕,像这样:
('PREDICTIONS:', [2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2])
('ACCURACY:', 0.080000000000000002)
一些提示:images_for_predict = mnist.test
和 self._build_graph
方法有两个参数:batch_size
和 is_training
。
谁能帮帮我?
看到你的批量规范的实现,当你加载你的模型时,你需要保持使用 images, labels, _, _, prediction, accuracy, saver = self._build_graph(1, False)
构建的图表并加载检查点的 权重值 ,但是不是元图。我认为 saver.restore(sess, './models/dense_nets_model/dense_nets.ckpt')
现在也恢复了元图(如果我错了请见谅),所以你只需要恢复它的 "data" 部分。
否则,你只是用图来训练,batch norm中使用的均值和方差是从batch中得到的。但是,当您测试批次的大小为 1 时,因此按批次的均值和方差进行归一化总是会使您的数据为 0,因此输出恒定。
无论如何,我建议改用 tf.layers.batch_normalization
,并使用 is_training
占位符,您需要将其提供给您的网络...
在尝试了很多方法后,我解决了这个问题,下面是我所做的。
首先感谢@gdelab,我改用了tf.layers.batch_normalization
,所以我的批量规范函数是这样的:
def _batch_normalization(self, input_tensor, is_training):
return tf.layers.batch_normalization(input_tensor, training=is_training)
参数 is_training
是这样的占位符:is_training = tf.placeholder(tf.bool)
构建图表时,请记住在优化中添加此代码:
extra_update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
with tf.control_dependencies(extra_update_ops):
train_step = tf.train.AdamOptimizer(self.learning_rate).minimize(cross_entropy)
因为 tf.layers.batch_normalization
添加到更新均值和方差不会自动添加为训练操作的依赖项 - 所以如果你不做任何额外的事情,它们永远不会得到 运行。
于是开始训练网络,训练完成后,使用如下代码保存模型:
saver = tf.train.Saver(var_list=tf.global_variables())
savepath = saver.save(sess, 'here_is_your_personal_model_path')
注意 var_list=tf.global_variables()
参数确保 tensorflow 保存所有参数包括设置为不可训练的全局 mean/var。
恢复和测试模型时,按如下方式进行:
# build the graph like training:
images, labels, _, _, prediction, accuracy, saver = self._build_graph(1, False)
saver = tf.train.Saver()
saver.restore(sess, 'here_is_your_personal_model_path')
现在可以测试his/her模型,希望对你有帮助,谢谢!