Tensorflow multi gpu example error: Variable conv1/weights/ExponentialMovingAverage/ does not exist
Tensorflow multi gpu example error: Variable conv1/weights/ExponentialMovingAverage/ does not exist
我是运行本教程中提到的代码:https://www.tensorflow.org/tutorials/deep_cnn/
我从这里下载代码:https://github.com/tensorflow/models/tree/master/tutorials/image/cifar10/
我是 运行 ubuntu 14.04 上 AWS 的 g2.4xlarge 机器上的代码。单个 gpu 示例运行良好,没有任何错误。
有人可以帮忙解决这个问题吗?我是运行0.12版本
ubuntu@ip-xxx-xx-xx-xx:~/pythonworkspace/tensorflowdev/models-master/tutorials/image/cifar10$ python -c 'import tensorflow as tf; print(tf.version)'
0.12.head
ubuntu@ip-xxx-xx-xx-xx:~/pythonworkspace/tensorflowdev/models-master/tutorials/image/cifar10$ python cifar10_multi_gpu_train.py --num_gpus=2
>> Downloading cifar-10-binary.tar.gz 100.0%
Successfully downloaded cifar-10-binary.tar.gz 170052171 bytes.
Filling queue with 20000 CIFAR images before starting to train. This will take a few minutes.
WARNING:tensorflow:From /home/ubuntu/pythonworkspace/tensorflowdev/models-master/tutorials/image/cifar10/cifar10_input.py:135: image_summary (from tensorflow.python.ops.logging_ops) is deprecated and will be removed after 2016-11-30.
Instructions for updating:
Please switch to tf.summary.image. Note that tf.summary.image uses the node name instead of the tag. This means that TensorFlow will automatically de-duplicate summary names based on the scope they are created in. Also, the max_images argument was renamed to max_outputs.
Filling queue with 20000 CIFAR images before starting to train. This will take a few minutes.
WARNING:tensorflow:From /home/ubuntu/pythonworkspace/tensorflowdev/models-master/tutorials/image/cifar10/cifar10_input.py:135: image_summary (from tensorflow.python.ops.logging_ops) is deprecated and will be removed after 2016-11-30.
Instructions for updating:
Please switch to tf.summary.image. Note that tf.summary.image uses the node name instead of the tag. This means that TensorFlow will automatically de-duplicate summary names based on the scope they are created in. Also, the max_images argument was renamed to max_outputs.
Traceback (most recent call last):
File "cifar10_multi_gpu_train.py", line 273, in <module>
tf.app.run()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 44, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "cifar10_multi_gpu_train.py", line 269, in main
train()
File "cifar10_multi_gpu_train.py", line 210, in train
variables_averages_op = variable_averages.apply(tf.trainable_variables())
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/moving_averages.py", line 373, in apply
colocate_with_primary=True)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/slot_creator.py", line 110, in create_slot
return _create_slot_var(primary, val, "")
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/slot_creator.py", line 64, in _create_slot_var
use_resource=_is_resource(primary))
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 1034, in get_variable
use_resource=use_resource, custom_getter=custom_getter)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 933, in get_variable
use_resource=use_resource, custom_getter=custom_getter)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 356, in get_variable
validate_shape=validate_shape, use_resource=use_resource)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 341, in _true_getter
use_resource=use_resource)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 671, in _get_single_variable
"VarScope?" % name)
ValueError: Variable conv1/weights/ExponentialMovingAverage/ does not exist, or was not created with tf.get_variable(). Did you mean to set reuse=None in VarScope?
您可以在这里找到问题的答案:
Issue 6220
您需要输入:
with tf.variable_scope(tf.get_variable_scope())
在你的设备上运行的循环前面......
所以,这样做:
with tf.variable_scope(tf.get_variable_scope()):
for i in xrange(FLAGS.num_gpus):
with tf.device('/gpu:%d' % i):
解释在link...
这里引用:
When you do tf.get_variable_scope().reuse_variables() you set the
current scope to reuse variables. If you call the optimizer in such
scope, it's trying to reuse slot variables, which it cannot find, so
it throws an error. If you put a scope around, the
tf.get_variable_scope().reuse_variables() only affects that scope, so
when you exit it, you're back in the non-reusing mode, the one you
want.
Hope that helps, let me know if I should clarify more.
我是运行本教程中提到的代码:https://www.tensorflow.org/tutorials/deep_cnn/
我从这里下载代码:https://github.com/tensorflow/models/tree/master/tutorials/image/cifar10/
我是 运行 ubuntu 14.04 上 AWS 的 g2.4xlarge 机器上的代码。单个 gpu 示例运行良好,没有任何错误。
有人可以帮忙解决这个问题吗?我是运行0.12版本
ubuntu@ip-xxx-xx-xx-xx:~/pythonworkspace/tensorflowdev/models-master/tutorials/image/cifar10$ python -c 'import tensorflow as tf; print(tf.version)'
0.12.head
ubuntu@ip-xxx-xx-xx-xx:~/pythonworkspace/tensorflowdev/models-master/tutorials/image/cifar10$ python cifar10_multi_gpu_train.py --num_gpus=2
>> Downloading cifar-10-binary.tar.gz 100.0%
Successfully downloaded cifar-10-binary.tar.gz 170052171 bytes.
Filling queue with 20000 CIFAR images before starting to train. This will take a few minutes.
WARNING:tensorflow:From /home/ubuntu/pythonworkspace/tensorflowdev/models-master/tutorials/image/cifar10/cifar10_input.py:135: image_summary (from tensorflow.python.ops.logging_ops) is deprecated and will be removed after 2016-11-30.
Instructions for updating:
Please switch to tf.summary.image. Note that tf.summary.image uses the node name instead of the tag. This means that TensorFlow will automatically de-duplicate summary names based on the scope they are created in. Also, the max_images argument was renamed to max_outputs.
Filling queue with 20000 CIFAR images before starting to train. This will take a few minutes.
WARNING:tensorflow:From /home/ubuntu/pythonworkspace/tensorflowdev/models-master/tutorials/image/cifar10/cifar10_input.py:135: image_summary (from tensorflow.python.ops.logging_ops) is deprecated and will be removed after 2016-11-30.
Instructions for updating:
Please switch to tf.summary.image. Note that tf.summary.image uses the node name instead of the tag. This means that TensorFlow will automatically de-duplicate summary names based on the scope they are created in. Also, the max_images argument was renamed to max_outputs.
Traceback (most recent call last):
File "cifar10_multi_gpu_train.py", line 273, in <module>
tf.app.run()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 44, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "cifar10_multi_gpu_train.py", line 269, in main
train()
File "cifar10_multi_gpu_train.py", line 210, in train
variables_averages_op = variable_averages.apply(tf.trainable_variables())
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/moving_averages.py", line 373, in apply
colocate_with_primary=True)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/slot_creator.py", line 110, in create_slot
return _create_slot_var(primary, val, "")
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/slot_creator.py", line 64, in _create_slot_var
use_resource=_is_resource(primary))
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 1034, in get_variable
use_resource=use_resource, custom_getter=custom_getter)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 933, in get_variable
use_resource=use_resource, custom_getter=custom_getter)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 356, in get_variable
validate_shape=validate_shape, use_resource=use_resource)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 341, in _true_getter
use_resource=use_resource)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 671, in _get_single_variable
"VarScope?" % name)
ValueError: Variable conv1/weights/ExponentialMovingAverage/ does not exist, or was not created with tf.get_variable(). Did you mean to set reuse=None in VarScope?
您可以在这里找到问题的答案: Issue 6220
您需要输入:
with tf.variable_scope(tf.get_variable_scope())
在你的设备上运行的循环前面......
所以,这样做:
with tf.variable_scope(tf.get_variable_scope()):
for i in xrange(FLAGS.num_gpus):
with tf.device('/gpu:%d' % i):
解释在link...
这里引用:
When you do tf.get_variable_scope().reuse_variables() you set the current scope to reuse variables. If you call the optimizer in such scope, it's trying to reuse slot variables, which it cannot find, so it throws an error. If you put a scope around, the tf.get_variable_scope().reuse_variables() only affects that scope, so when you exit it, you're back in the non-reusing mode, the one you want.
Hope that helps, let me know if I should clarify more.