为什么 `optimizer.minimize()` 不会因 `tf.slim.learning.train()` 而 return 丢失?
Why does `optimizer.minimize()` not return loss with `tf.slim.learning.train()`?
我正在使用 tf-slim
微调网络,vgg16
。我想通过对最后一层应用不同的学习率来手动操纵渐变。但是当我尝试使用 opt.minimize()
或 tf.gradients()
和 opt.apply_gradients()
时,我在摘要报告
中得到一个 None
损失值
为什么 train_op
的代码路径有效:
optimizer = tf.train.GradientDescentOptimizer( learning_rate=.001 )
train_op = slim.learning.create_train_op(total_loss, optimizer,
global_step=global_step)
slim.learning.train(train_op, log_dir,
init_fn=init_fn,
global_step=global_step,
number_of_steps=25,
save_summaries_secs=300,
save_interval_secs=600
)
但是手动创建 train_op
失败并出现以下异常(例如 total_loss
是 None
):
trainable = tf.trainable_variables()
optimizer = tf.train.GradientDescentOptimizer(learning_rate=.001)
train_op = optimizer.minimize( total_loss, global_step=global_step )
# exception: appears that loss is None
--- Logging error ---
Traceback (most recent call last):
...
File "/anaconda/anaconda3/lib/python3.6/site-packages/tensorflow/contrib/slim/python/slim/learning.py", line 755, in train
sess, train_op, global_step, train_step_kwargs)
File "/anaconda/anaconda3/lib/python3.6/site-packages/tensorflow/contrib/slim/python/slim/learning.py", line 506, in train_step
np_global_step, total_loss, time_elapsed)
File "/anaconda/anaconda3/lib/python3.6/logging/__init__.py", line 338, in getMessage
msg = msg % self.args
TypeError: must be real number, not NoneType
...
Message: 'global step %d: loss = %.4f (%.3f sec/step)'
Arguments: (29, None, 51.91366386413574)
我做错了什么?
问题在于,尽管名称为 create_train_op()
,但 slim
创建的 return 类型与 train_op
的通常定义不同,这正是您所使用的在第二种情况下,当您使用 "non-slim" 调用时:
optimizer.minimize( total_loss, global_step=global_step )
试试这个:
optimizer = tf.train.GradientDescentOptimizer( learning_rate=.001 )
train_op_no_slim = optimizer.minimize(total_loss)
train_op = slim.learning.create_train_op(total_loss, optimizer)
print(train_op_no_slim)
print(train_op)
首先,我得到 "usual"(在 tensorflow 中):
name: "GradientDescent_2"
op: "NoOp"
input: "^GradientDescent_2/update_layer1/weight1/ApplyGradientDescent"
input: "^GradientDescent_2/update_layer1/bias1/ApplyGradientDescent"
input: "^GradientDescent_2/update_layer2/weight2/ApplyGradientDescent"
input: "^GradientDescent_2/update_layer2/bias2/ApplyGradientDescent"
input: "^GradientDescent_2/update_layer3/weight3/ApplyGradientDescent"
input: "^GradientDescent_2/update_layer3/bias3/ApplyGradientDescent"
对于第二个 print
语句,我得到:
Tensor("train_op_1/control_dependency:0", shape=(), dtype=float32)
简而言之,slim.learning.create_train_op
与 optimizer.minimize()
没有相同的 return 类型。
要解决此问题:您使用直接定义的 train_op
会使您超出标准 slim
范围。我建议接受它,并以非苗条的方式对直接定义的 train_op
进行操作,使用 sess.run()
或 train_op.run()
作为 typical (non-slim) tensorflow example.
我的用例是对模型的最后一个微调层应用不同的 learning_rate
——这似乎表明我必须使用第二个优化器。
假设坚持使用该框架会在以后得到回报,这就是我必须做的,以便为 tf.slim.create_train_op()
拼凑一个等效函数,该函数接受多个 optimizers
和 grads_and_vars
.
def slim_learning_create_train_op_with_manual_grads( total_loss, optimizers, grads_and_vars,
global_step=0,
# update_ops=None,
# variables_to_train=None,
clip_gradient_norm=0,
summarize_gradients=False,
gate_gradients=1, # tf.python.training.optimizer.Optimizer.GATE_OP,
aggregation_method=None,
colocate_gradients_with_ops=False,
gradient_multipliers=None,
check_numerics=True):
"""Runs the training loop
modified from slim.learning.create_train_op() to work with
a matched list of optimizers and grads_and_vars
Returns:
train_ops - the value of the loss function after training.
"""
from tensorflow.python.framework import ops
from tensorflow.python.ops import array_ops
from tensorflow.python.ops import control_flow_ops
from tensorflow.python.training import training_util
def transform_grads_fn(grads):
if gradient_multipliers:
with ops.name_scope('multiply_grads'):
grads = multiply_gradients(grads, gradient_multipliers)
# Clip gradients.
if clip_gradient_norm > 0:
with ops.name_scope('clip_grads'):
grads = clip_gradient_norms(grads, clip_gradient_norm)
return grads
if global_step is None:
global_step = training_util.get_or_create_global_step()
assert len(optimizers)==len(grads_and_vars)
### order of processing:
# 0. grads = opt.compute_gradients()
# 1. grads = transform_grads_fn(grads)
# 2. add_gradients_summaries(grads)
# 3. grads = opt.apply_gradients(grads, global_step=global_step)
grad_updates = []
for i in range(len(optimizers)):
grads = grads_and_vars[i] # 0. kvarg, from opt.compute_gradients()
grads = transform_grads_fn(grads) # 1. transform_grads_fn()
if summarize_gradients:
with ops.name_scope('summarize_grads'):
slim.learning.add_gradients_summaries(grads) # 2. add_gradients_summaries()
if i==0:
grad_update = optimizers[i].apply_gradients( grads, # 3. optimizer.apply_gradients()
global_step=global_step) # update global_step only once
else:
grad_update = optimizers[i].apply_gradients( grads )
grad_updates.append(grad_update)
with ops.name_scope('train_op'):
total_loss = array_ops.check_numerics(total_loss,
'LossTensor is inf or nan')
train_op = control_flow_ops.with_dependencies(grad_updates, total_loss)
# Add the operation used for training to the 'train_op' collection
train_ops = ops.get_collection_ref(ops.GraphKeys.TRAIN_OP)
if train_op not in train_ops:
train_ops.append(train_op)
return train_op
我正在使用 tf-slim
微调网络,vgg16
。我想通过对最后一层应用不同的学习率来手动操纵渐变。但是当我尝试使用 opt.minimize()
或 tf.gradients()
和 opt.apply_gradients()
时,我在摘要报告
None
损失值
为什么 train_op
的代码路径有效:
optimizer = tf.train.GradientDescentOptimizer( learning_rate=.001 )
train_op = slim.learning.create_train_op(total_loss, optimizer,
global_step=global_step)
slim.learning.train(train_op, log_dir,
init_fn=init_fn,
global_step=global_step,
number_of_steps=25,
save_summaries_secs=300,
save_interval_secs=600
)
但是手动创建 train_op
失败并出现以下异常(例如 total_loss
是 None
):
trainable = tf.trainable_variables()
optimizer = tf.train.GradientDescentOptimizer(learning_rate=.001)
train_op = optimizer.minimize( total_loss, global_step=global_step )
# exception: appears that loss is None
--- Logging error ---
Traceback (most recent call last):
...
File "/anaconda/anaconda3/lib/python3.6/site-packages/tensorflow/contrib/slim/python/slim/learning.py", line 755, in train
sess, train_op, global_step, train_step_kwargs)
File "/anaconda/anaconda3/lib/python3.6/site-packages/tensorflow/contrib/slim/python/slim/learning.py", line 506, in train_step
np_global_step, total_loss, time_elapsed)
File "/anaconda/anaconda3/lib/python3.6/logging/__init__.py", line 338, in getMessage
msg = msg % self.args
TypeError: must be real number, not NoneType
...
Message: 'global step %d: loss = %.4f (%.3f sec/step)'
Arguments: (29, None, 51.91366386413574)
我做错了什么?
问题在于,尽管名称为 create_train_op()
,但 slim
创建的 return 类型与 train_op
的通常定义不同,这正是您所使用的在第二种情况下,当您使用 "non-slim" 调用时:
optimizer.minimize( total_loss, global_step=global_step )
试试这个:
optimizer = tf.train.GradientDescentOptimizer( learning_rate=.001 )
train_op_no_slim = optimizer.minimize(total_loss)
train_op = slim.learning.create_train_op(total_loss, optimizer)
print(train_op_no_slim)
print(train_op)
首先,我得到 "usual"(在 tensorflow 中):
name: "GradientDescent_2"
op: "NoOp"
input: "^GradientDescent_2/update_layer1/weight1/ApplyGradientDescent"
input: "^GradientDescent_2/update_layer1/bias1/ApplyGradientDescent"
input: "^GradientDescent_2/update_layer2/weight2/ApplyGradientDescent"
input: "^GradientDescent_2/update_layer2/bias2/ApplyGradientDescent"
input: "^GradientDescent_2/update_layer3/weight3/ApplyGradientDescent"
input: "^GradientDescent_2/update_layer3/bias3/ApplyGradientDescent"
对于第二个 print
语句,我得到:
Tensor("train_op_1/control_dependency:0", shape=(), dtype=float32)
简而言之,slim.learning.create_train_op
与 optimizer.minimize()
没有相同的 return 类型。
要解决此问题:您使用直接定义的 train_op
会使您超出标准 slim
范围。我建议接受它,并以非苗条的方式对直接定义的 train_op
进行操作,使用 sess.run()
或 train_op.run()
作为 typical (non-slim) tensorflow example.
我的用例是对模型的最后一个微调层应用不同的 learning_rate
——这似乎表明我必须使用第二个优化器。
假设坚持使用该框架会在以后得到回报,这就是我必须做的,以便为 tf.slim.create_train_op()
拼凑一个等效函数,该函数接受多个 optimizers
和 grads_and_vars
.
def slim_learning_create_train_op_with_manual_grads( total_loss, optimizers, grads_and_vars,
global_step=0,
# update_ops=None,
# variables_to_train=None,
clip_gradient_norm=0,
summarize_gradients=False,
gate_gradients=1, # tf.python.training.optimizer.Optimizer.GATE_OP,
aggregation_method=None,
colocate_gradients_with_ops=False,
gradient_multipliers=None,
check_numerics=True):
"""Runs the training loop
modified from slim.learning.create_train_op() to work with
a matched list of optimizers and grads_and_vars
Returns:
train_ops - the value of the loss function after training.
"""
from tensorflow.python.framework import ops
from tensorflow.python.ops import array_ops
from tensorflow.python.ops import control_flow_ops
from tensorflow.python.training import training_util
def transform_grads_fn(grads):
if gradient_multipliers:
with ops.name_scope('multiply_grads'):
grads = multiply_gradients(grads, gradient_multipliers)
# Clip gradients.
if clip_gradient_norm > 0:
with ops.name_scope('clip_grads'):
grads = clip_gradient_norms(grads, clip_gradient_norm)
return grads
if global_step is None:
global_step = training_util.get_or_create_global_step()
assert len(optimizers)==len(grads_and_vars)
### order of processing:
# 0. grads = opt.compute_gradients()
# 1. grads = transform_grads_fn(grads)
# 2. add_gradients_summaries(grads)
# 3. grads = opt.apply_gradients(grads, global_step=global_step)
grad_updates = []
for i in range(len(optimizers)):
grads = grads_and_vars[i] # 0. kvarg, from opt.compute_gradients()
grads = transform_grads_fn(grads) # 1. transform_grads_fn()
if summarize_gradients:
with ops.name_scope('summarize_grads'):
slim.learning.add_gradients_summaries(grads) # 2. add_gradients_summaries()
if i==0:
grad_update = optimizers[i].apply_gradients( grads, # 3. optimizer.apply_gradients()
global_step=global_step) # update global_step only once
else:
grad_update = optimizers[i].apply_gradients( grads )
grad_updates.append(grad_update)
with ops.name_scope('train_op'):
total_loss = array_ops.check_numerics(total_loss,
'LossTensor is inf or nan')
train_op = control_flow_ops.with_dependencies(grad_updates, total_loss)
# Add the operation used for training to the 'train_op' collection
train_ops = ops.get_collection_ref(ops.GraphKeys.TRAIN_OP)
if train_op not in train_ops:
train_ops.append(train_op)
return train_op