计算状态:未找到:在检查点文件中未找到张量名称 "input_producer/limit_epochs/epochs"
Compute status: Not found: Tensor name "input_producer/limit_epochs/epochs" not found in checkpoint files
我正在使用 CIFAR10 示例。我用提供的代码训练了网络。培训圆满结束。因为我只想在我的数据集上评估每个示例一次,所以我将 cifar10_input.py 中的输入修改为以下内容。
def inputs(eval_data, data_dir, batch_size):
filename = os.path.join(data_dir, TEST_FILE)
filename_queue = tf.train.string_input_producer([filename],num_epochs=1)
image, label = read_and_decode(filename_queue)
float_image = tf.image.per_image_whitening(image)
min_fraction_of_examples_in_queue = 0.4
min_queue_examples = int(NUM_EXAMPLES_PER_EPOCH_FOR_EVAL *
min_fraction_of_examples_in_queue)
images, label_batch = tf.train.batch(
[image, label],
batch_size=batch_size,
num_threads=1,
capacity=min_queue_examples + 3 * batch_size)
tf.image_summary('images', images)
return images, tf.reshape(label_batch, [batch_size])
我已将问题隔离为以下几点:
tf.train_string_input_producer([filename], num_epochs = 1)
如果我不设置 num_epochs = 1,一切正常。如果这样做,我会收到以下错误。
0x2cf2700 Compute status: Not found: Tensor name "input_producer/limit_epochs/epochs" not found in checkpoint files /home/jkschin/tensorflow/my_code/data/svhn/train/model.ckpt-8000
感谢您的帮助!
编辑 3 @mrry:
还是失败了。这是痕迹。
Traceback (most recent call last):
File "cnn_eval.py", line 148, in <module>
tf.app.run()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/default/_app.py", line 30, in run
sys.exit(main(sys.argv))
File "cnn_eval.py", line 144, in main
evaluate()
File "cnn_eval.py", line 119, in evaluate
saver = tf.train.Saver([v for v in variables_to_restore if v.name != "input_producer/limit_epochs/epochs"])
AttributeError: 'unicode' object has no attribute 'name'
编辑 4 @mrry:
softmax_linear/biases/ExponentialMovingAverage
conv2/biases/ExponentialMovingAverage
local4/biases/ExponentialMovingAverage
local3/biases/ExponentialMovingAverage
softmax_linear/weights/ExponentialMovingAverage
conv1/biases/ExponentialMovingAverage
local4/weights/ExponentialMovingAverage
conv2/weights/ExponentialMovingAverage
input_producer/limit_epochs/epochs
local3/weights/ExponentialMovingAverage
conv1/weights/ExponentialMovingAverage
Traceback (most recent call last):
File "cnn_eval.py", line 148, in <module>
tf.app.run()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/default/_app.py", line 30, in run
sys.exit(main(sys.argv))
File "cnn_eval.py", line 144, in main
evaluate()
File "cnn_eval.py", line 119, in evaluate
saver = tf.train.Saver([v for v in variables_to_restore if v != "input_producer/limit_epochs/epochs"])
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/saver.py", line 784, in __init__
restore_sequentially=restore_sequentially)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/saver.py", line 437, in build
vars_to_save = self._ValidateAndSliceInputs(names_to_variables)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/saver.py", line 340, in _ValidateAndSliceInputs
names_to_variables = self._VarListToDict(names_to_variables)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/saver.py", line 314, in _VarListToDict
raise TypeError("Variable to save is not a Variable: %s" % var)
TypeError: Variable to save is not a Variable: Tensor("Const:0", shape=(), dtype=string)
编辑 5 @mrry:
saver = tf.train.Saver([tf.Variable(0.0,validate_shape=False,name=v) for v in variables_to_restore if v != "input_producer/limit_epochs/epochs"])
0x21d0cb0 Compute status: Invalid argument: Assign requires shapes of both tensors to match. lhs shape= [] rhs shape= [10]
[[Node: save/Assign_8 = Assign[T=DT_FLOAT, use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/gpu:0"](softmax_linear/biases/ExponentialMovingAverage, save/restore_slice_8/_20)]]
TL;DR: 在 cifar10_eval.py
中,更改保存器构造函数,使其成为:
saver = tf.train.Saver([v for v in variables_to_restore
if v != "input_producer/limit_epochs/epochs"])
出现此问题的原因是 tf.train.string_input_producer()
internally creates a variable (called "input_producer/limit_epochs/epochs"
) when its num_epochs
argument is not None
. When, in cifar10_eval.py
a tf.train.Saver
is created, it uses tf.all_variables()
,其中包含来自 tf.nn.string_input_producer()
的隐式创建变量。此变量列表决定了 TensorFlow 在检查点文件中查找的名称集。
目前没有很好的方法来引用隐式创建的变量,除了通过它们的名称。因此,最好的解决方法是按名称从 Saver
构造函数中排除变量。
消除隐式变量"input_producer/limit_epochs/epochs"
的另一种方法是只加载可训练变量:
saver = tf.train.Saver(tf.trainable_variables())
我正在使用 CIFAR10 示例。我用提供的代码训练了网络。培训圆满结束。因为我只想在我的数据集上评估每个示例一次,所以我将 cifar10_input.py 中的输入修改为以下内容。
def inputs(eval_data, data_dir, batch_size):
filename = os.path.join(data_dir, TEST_FILE)
filename_queue = tf.train.string_input_producer([filename],num_epochs=1)
image, label = read_and_decode(filename_queue)
float_image = tf.image.per_image_whitening(image)
min_fraction_of_examples_in_queue = 0.4
min_queue_examples = int(NUM_EXAMPLES_PER_EPOCH_FOR_EVAL *
min_fraction_of_examples_in_queue)
images, label_batch = tf.train.batch(
[image, label],
batch_size=batch_size,
num_threads=1,
capacity=min_queue_examples + 3 * batch_size)
tf.image_summary('images', images)
return images, tf.reshape(label_batch, [batch_size])
我已将问题隔离为以下几点:
tf.train_string_input_producer([filename], num_epochs = 1)
如果我不设置 num_epochs = 1,一切正常。如果这样做,我会收到以下错误。
0x2cf2700 Compute status: Not found: Tensor name "input_producer/limit_epochs/epochs" not found in checkpoint files /home/jkschin/tensorflow/my_code/data/svhn/train/model.ckpt-8000
感谢您的帮助!
编辑 3 @mrry:
还是失败了。这是痕迹。
Traceback (most recent call last):
File "cnn_eval.py", line 148, in <module>
tf.app.run()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/default/_app.py", line 30, in run
sys.exit(main(sys.argv))
File "cnn_eval.py", line 144, in main
evaluate()
File "cnn_eval.py", line 119, in evaluate
saver = tf.train.Saver([v for v in variables_to_restore if v.name != "input_producer/limit_epochs/epochs"])
AttributeError: 'unicode' object has no attribute 'name'
编辑 4 @mrry:
softmax_linear/biases/ExponentialMovingAverage
conv2/biases/ExponentialMovingAverage
local4/biases/ExponentialMovingAverage
local3/biases/ExponentialMovingAverage
softmax_linear/weights/ExponentialMovingAverage
conv1/biases/ExponentialMovingAverage
local4/weights/ExponentialMovingAverage
conv2/weights/ExponentialMovingAverage
input_producer/limit_epochs/epochs
local3/weights/ExponentialMovingAverage
conv1/weights/ExponentialMovingAverage
Traceback (most recent call last):
File "cnn_eval.py", line 148, in <module>
tf.app.run()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/default/_app.py", line 30, in run
sys.exit(main(sys.argv))
File "cnn_eval.py", line 144, in main
evaluate()
File "cnn_eval.py", line 119, in evaluate
saver = tf.train.Saver([v for v in variables_to_restore if v != "input_producer/limit_epochs/epochs"])
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/saver.py", line 784, in __init__
restore_sequentially=restore_sequentially)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/saver.py", line 437, in build
vars_to_save = self._ValidateAndSliceInputs(names_to_variables)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/saver.py", line 340, in _ValidateAndSliceInputs
names_to_variables = self._VarListToDict(names_to_variables)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/saver.py", line 314, in _VarListToDict
raise TypeError("Variable to save is not a Variable: %s" % var)
TypeError: Variable to save is not a Variable: Tensor("Const:0", shape=(), dtype=string)
编辑 5 @mrry:
saver = tf.train.Saver([tf.Variable(0.0,validate_shape=False,name=v) for v in variables_to_restore if v != "input_producer/limit_epochs/epochs"])
0x21d0cb0 Compute status: Invalid argument: Assign requires shapes of both tensors to match. lhs shape= [] rhs shape= [10]
[[Node: save/Assign_8 = Assign[T=DT_FLOAT, use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/gpu:0"](softmax_linear/biases/ExponentialMovingAverage, save/restore_slice_8/_20)]]
TL;DR: 在 cifar10_eval.py
中,更改保存器构造函数,使其成为:
saver = tf.train.Saver([v for v in variables_to_restore
if v != "input_producer/limit_epochs/epochs"])
出现此问题的原因是 tf.train.string_input_producer()
internally creates a variable (called "input_producer/limit_epochs/epochs"
) when its num_epochs
argument is not None
. When, in cifar10_eval.py
a tf.train.Saver
is created, it uses tf.all_variables()
,其中包含来自 tf.nn.string_input_producer()
的隐式创建变量。此变量列表决定了 TensorFlow 在检查点文件中查找的名称集。
目前没有很好的方法来引用隐式创建的变量,除了通过它们的名称。因此,最好的解决方法是按名称从 Saver
构造函数中排除变量。
消除隐式变量"input_producer/limit_epochs/epochs"
的另一种方法是只加载可训练变量:
saver = tf.train.Saver(tf.trainable_variables())