mgray mgray - 3 months ago 23
Python Question

tensorflow.merge_all_summaries() hangs

I've been having a problem where if I run

sess.run(tf.merge_all_summaries())
during training, the program will hang. This was also brought up in a github issue, although I'm not sure that my problem is the same.

For reference, this is the code I'm using to train:

logits = fcn8.upscore # last layer of the network
loss = softmax_loss(logits, lb, pipe.NUM_CLASSES)

train_op = build_graph(loss, global_step)
saver = tf.train.Saver(tf.all_variables())
summary_op = tf.merge_all_summaries()

sess.run(tf.initialize_all_variables())
tf.train.start_queue_runners(sess=sess)
summary_writer = tf.train.SummaryWriter(FLAGS.train_dir, sess.graph)

for step in range(FLAGS.max_epochs * pipe.EPOCH_LENGTH):
if sess.run(queue.size()) == 0:
sess.run(enqueue_files)

_, loss_val = sess.run([train_op, loss])

if step % 10 == 0:
print('loss at step {}: {}'.format(step, loss_val))
summary = sess.run(summary_op) # hangs here
summary_writer.add_summary(summary, step)


Is this a common thing? Or is there some error I've made in writing the training code? Thanks in advance for any help.

EDIT: It seems like the only time this happens is when the queue is empty when I try to merge summaries. I wonder if this is a coincidence.

Answer

Your summary_op likely triggers a Queue dequeue which will hang when the queue is empty.

One work-around is to restructure your code using variables so that summaries don't trigger queue dequeue, like here -- TensorFlow: Reading images in queue without shuffling

A simpler work-around is to initialize your Session with a deadline so that your empty dequeues fail with DeadlineExceeded after some time rather than hanging

tf.reset_default_graph()
queue = tf.FIFOQueue(capacity=5, dtypes=[tf.int32])
config = tf.ConfigProto()
config.operation_timeout_in_ms=2000
sess = tf.InteractiveSession("", config=config)
try:
    sess.run(queue.dequeue())
except tf.errors.DeadlineExceededError:
    print "DeadlineExceededError detected"
Comments