mgray mgray - 1 year ago 154
Python Question

tensorflow.merge_all_summaries() hangs

I've been having a problem where if I run
during training, the program will hang. This was also brought up in a github issue, although I'm not sure that my problem is the same.

For reference, this is the code I'm using to train:

logits = fcn8.upscore # last layer of the network
loss = softmax_loss(logits, lb, pipe.NUM_CLASSES)

train_op = build_graph(loss, global_step)
saver = tf.train.Saver(tf.all_variables())
summary_op = tf.merge_all_summaries()
summary_writer = tf.train.SummaryWriter(FLAGS.train_dir, sess.graph)

for step in range(FLAGS.max_epochs * pipe.EPOCH_LENGTH):
if == 0:

_, loss_val =[train_op, loss])

if step % 10 == 0:
print('loss at step {}: {}'.format(step, loss_val))
summary = # hangs here
summary_writer.add_summary(summary, step)

Is this a common thing? Or is there some error I've made in writing the training code? Thanks in advance for any help.

EDIT: It seems like the only time this happens is when the queue is empty when I try to merge summaries. I wonder if this is a coincidence.

Answer Source

Your summary_op likely triggers a Queue dequeue which will hang when the queue is empty.

One work-around is to restructure your code using variables so that summaries don't trigger queue dequeue, like here -- TensorFlow: Reading images in queue without shuffling

A simpler work-around is to initialize your Session with a deadline so that your empty dequeues fail with DeadlineExceeded after some time rather than hanging

queue = tf.FIFOQueue(capacity=5, dtypes=[tf.int32])
config = tf.ConfigProto()
sess = tf.InteractiveSession("", config=config)
except tf.errors.DeadlineExceededError:
    print "DeadlineExceededError detected"