llevar llevar - 8 months ago 114
Python Question

Learning rate larger than 0.001 results in error

I have tried to hack together code from the Udacity Deep Learning course (Assignment 3 - Regularization) and the Tensorflow mnist_with_summaries.py Tutorial. My code appears to run fine


but something strange is going on. The assignments all use a learning rate of 0.5, and at some point introduce exponential decay. However, the code I put together only runs fine when I set the learning rate to 0.001 (with decay or without). If I set the initial rate at 0.1 or greater I get the following error:

Traceback (most recent call last):
File "/Users/siakhnin/Documents/workspace/udacity_deep_learning/multi-layer-net.py", line 175, in <module>
summary, my_accuracy, _ = my_session.run([merged, accuracy, train_step], feed_dict=feed_dict)
File "/usr/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 340, in run
File "/usr/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 564, in _run
feed_dict_string, options, run_metadata)
File "/usr/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 637, in _do_run
target_list, options, run_metadata)
File "/usr/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 659, in _do_call
tensorflow.python.framework.errors.InvalidArgumentError: Nan in summary histogram for: layer1/weights/summaries/HistogramSummary
[[Node: layer1/weights/summaries/HistogramSummary = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](layer1/weights/summaries/HistogramSummary/tag, layer1/weights/Variable/read)]]
Caused by op u'layer1/weights/summaries/HistogramSummary', defined at:
File "/Users/siakhnin/Documents/workspace/udacity_deep_learning/multi-layer-net.py", line 106, in <module>
layer1, weights_1 = nn_layer(x, num_features, 1024, 'layer1')
File "/Users/siakhnin/Documents/workspace/udacity_deep_learning/multi-layer-net.py", line 79, in nn_layer
variable_summaries(weights, layer_name + '/weights')
File "/Users/siakhnin/Documents/workspace/udacity_deep_learning/multi-layer-net.py", line 65, in variable_summaries
tf.histogram_summary(name, var)
File "/usr/local/lib/python2.7/site-packages/tensorflow/python/ops/logging_ops.py", line 113, in histogram_summary
tag=tag, values=values, name=scope)
File "/usr/local/lib/python2.7/site-packages/tensorflow/python/ops/gen_logging_ops.py", line 55, in _histogram_summary
File "/usr/local/lib/python2.7/site-packages/tensorflow/python/ops/op_def_library.py", line 655, in apply_op
File "/usr/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2154, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/usr/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1154, in __init__
self._traceback = _extract_stack()

If I set the rate at 0.001 then the code runs to completion with a test accuracy of 0.94.

Using tensorflow 0.8 RC0 on Mac OS X.


Looks like your training is diverging (which causes you to get infinities or NaNs). There's no simple explanation for why things diverge under some set of conditions but not others, but generally higher learning rate makes it more likely to diverge.

Edit, Apr 17 You are getting a NaN in your Histogram summary which most likely means there's a NaN in your weights or activations. NaNs are caused by numerically improper calculations, ie taking log of 0 and multiplying result by 0. There's also a small chance there's some bug in histograms, to rule out this, turn off summaries, and see if you are still able to train to good accuracy.

To turn off summaries, replace this line merged = tf.merge_all_summaries()

with this

merged = tf.constant(1)

and comment out this line