学习率大于 0.001 会导致错误
Learning rate larger than 0.001 results in error
我试图将 Udacity 深度学习课程(作业 3 - 正则化)和 Tensorflow mnist_with_summaries.py 教程中的代码整合在一起。我的代码似乎 运行 没问题
https://github.com/llevar/udacity_deep_learning/blob/master/multi-layer-net.py
但是发生了一些奇怪的事情。这些作业都使用 0.5 的学习率,并在某些时候引入指数衰减。但是,当我将学习率设置为 0.001(有或没有衰减)时,我放在一起的代码只有 运行s 好。如果我将初始速率设置为 0.1 或更高,我会收到以下错误:
Traceback (most recent call last):
File "/Users/siakhnin/Documents/workspace/udacity_deep_learning/multi-layer-net.py", line 175, in <module>
summary, my_accuracy, _ = my_session.run([merged, accuracy, train_step], feed_dict=feed_dict)
File "/usr/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 340, in run
run_metadata_ptr)
File "/usr/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 564, in _run
feed_dict_string, options, run_metadata)
File "/usr/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 637, in _do_run
target_list, options, run_metadata)
File "/usr/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 659, in _do_call
e.code)
tensorflow.python.framework.errors.InvalidArgumentError: Nan in summary histogram for: layer1/weights/summaries/HistogramSummary
[[Node: layer1/weights/summaries/HistogramSummary = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](layer1/weights/summaries/HistogramSummary/tag, layer1/weights/Variable/read)]]
Caused by op u'layer1/weights/summaries/HistogramSummary', defined at:
File "/Users/siakhnin/Documents/workspace/udacity_deep_learning/multi-layer-net.py", line 106, in <module>
layer1, weights_1 = nn_layer(x, num_features, 1024, 'layer1')
File "/Users/siakhnin/Documents/workspace/udacity_deep_learning/multi-layer-net.py", line 79, in nn_layer
variable_summaries(weights, layer_name + '/weights')
File "/Users/siakhnin/Documents/workspace/udacity_deep_learning/multi-layer-net.py", line 65, in variable_summaries
tf.histogram_summary(name, var)
File "/usr/local/lib/python2.7/site-packages/tensorflow/python/ops/logging_ops.py", line 113, in histogram_summary
tag=tag, values=values, name=scope)
File "/usr/local/lib/python2.7/site-packages/tensorflow/python/ops/gen_logging_ops.py", line 55, in _histogram_summary
name=name)
File "/usr/local/lib/python2.7/site-packages/tensorflow/python/ops/op_def_library.py", line 655, in apply_op
op_def=op_def)
File "/usr/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2154, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/usr/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1154, in __init__
self._traceback = _extract_stack()
如果我将速率设置为 0.001,则代码 运行 秒完成,测试精度为 0.94。
在 Mac OS X 上使用 tensorflow 0.8 RC0。
看起来你的训练发散了(这会导致你得到无穷大或 NaN)。没有简单的解释为什么事物在某些条件下会发散,但在其他情况下不会发散,但通常较高的学习率会使其更容易发散。
编辑,4 月 17 日
您在 Histogram
摘要中得到一个 NaN
,这很可能意味着您的权重或激活中有一个 NaN
。 NaN
s 是由于数值计算不当造成的,即对 0 取对数并将结果乘以 0。直方图也有可能存在一些小错误,要排除这种情况,请关闭汇总,看看您是否仍然能够训练到良好的准确性。
要关闭摘要,请替换此行
合并 = tf.merge_all_summaries()
有了这个
merged = tf.constant(1)
并注释掉这一行
test_writer.add_summary(summary)
你交叉熵:
diff = y_ * tf.log(y)
可能还要考虑0*log(0)的情况
您可以将其更改为:
cross_entropy = -tf.reduce_sum(y_*tf.log(tf.clip_by_value(y_conv,1e-10,1.0)))
来源:
我试图将 Udacity 深度学习课程(作业 3 - 正则化)和 Tensorflow mnist_with_summaries.py 教程中的代码整合在一起。我的代码似乎 运行 没问题
https://github.com/llevar/udacity_deep_learning/blob/master/multi-layer-net.py
但是发生了一些奇怪的事情。这些作业都使用 0.5 的学习率,并在某些时候引入指数衰减。但是,当我将学习率设置为 0.001(有或没有衰减)时,我放在一起的代码只有 运行s 好。如果我将初始速率设置为 0.1 或更高,我会收到以下错误:
Traceback (most recent call last):
File "/Users/siakhnin/Documents/workspace/udacity_deep_learning/multi-layer-net.py", line 175, in <module>
summary, my_accuracy, _ = my_session.run([merged, accuracy, train_step], feed_dict=feed_dict)
File "/usr/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 340, in run
run_metadata_ptr)
File "/usr/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 564, in _run
feed_dict_string, options, run_metadata)
File "/usr/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 637, in _do_run
target_list, options, run_metadata)
File "/usr/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 659, in _do_call
e.code)
tensorflow.python.framework.errors.InvalidArgumentError: Nan in summary histogram for: layer1/weights/summaries/HistogramSummary
[[Node: layer1/weights/summaries/HistogramSummary = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](layer1/weights/summaries/HistogramSummary/tag, layer1/weights/Variable/read)]]
Caused by op u'layer1/weights/summaries/HistogramSummary', defined at:
File "/Users/siakhnin/Documents/workspace/udacity_deep_learning/multi-layer-net.py", line 106, in <module>
layer1, weights_1 = nn_layer(x, num_features, 1024, 'layer1')
File "/Users/siakhnin/Documents/workspace/udacity_deep_learning/multi-layer-net.py", line 79, in nn_layer
variable_summaries(weights, layer_name + '/weights')
File "/Users/siakhnin/Documents/workspace/udacity_deep_learning/multi-layer-net.py", line 65, in variable_summaries
tf.histogram_summary(name, var)
File "/usr/local/lib/python2.7/site-packages/tensorflow/python/ops/logging_ops.py", line 113, in histogram_summary
tag=tag, values=values, name=scope)
File "/usr/local/lib/python2.7/site-packages/tensorflow/python/ops/gen_logging_ops.py", line 55, in _histogram_summary
name=name)
File "/usr/local/lib/python2.7/site-packages/tensorflow/python/ops/op_def_library.py", line 655, in apply_op
op_def=op_def)
File "/usr/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2154, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/usr/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1154, in __init__
self._traceback = _extract_stack()
如果我将速率设置为 0.001,则代码 运行 秒完成,测试精度为 0.94。
在 Mac OS X 上使用 tensorflow 0.8 RC0。
看起来你的训练发散了(这会导致你得到无穷大或 NaN)。没有简单的解释为什么事物在某些条件下会发散,但在其他情况下不会发散,但通常较高的学习率会使其更容易发散。
编辑,4 月 17 日
您在 Histogram
摘要中得到一个 NaN
,这很可能意味着您的权重或激活中有一个 NaN
。 NaN
s 是由于数值计算不当造成的,即对 0 取对数并将结果乘以 0。直方图也有可能存在一些小错误,要排除这种情况,请关闭汇总,看看您是否仍然能够训练到良好的准确性。
要关闭摘要,请替换此行 合并 = tf.merge_all_summaries()
有了这个
merged = tf.constant(1)
并注释掉这一行
test_writer.add_summary(summary)
你交叉熵:
diff = y_ * tf.log(y)
可能还要考虑0*log(0)的情况
您可以将其更改为:
cross_entropy = -tf.reduce_sum(y_*tf.log(tf.clip_by_value(y_conv,1e-10,1.0)))
来源: