I am deploying my conv-deconv net. My question is the cross entropy was always nan while training so the solver didn't update the weights. I checked my code all day but I didn't know where did I go wrong. The following is my architecture:
here is my cross entropy function
ys_reshape = tf.reshape(ys,[-1,1])
prediction = tf.reshape(relu4,[-1,1])
cross_entropy = tf.reduce_mean(-(ys_reshape*tf.log(prediction)))
train_step = tf.train.AdamOptimizer(0.01).minimize(cross_entropy)
You did a great job of narrowing the problem down to the right couple of lines of code.
So your predicted probability is directly the output of
There are two problems with that.
First: it can be greater than one.
It can be exactly zero (Anywhere the input to
ReLU4 is negative, it's output will be zero).
log(0) -> NaN
The usual approach to this is to treat the linear activations (No ReLU) as the log-odds of each class.
A naive implementation is always broken (numerical issues).
Since you have a single class, you should use tf.sigmoid_cross_entropy_with_logits