Kendall Weihe Kendall Weihe - 5 months ago 129
Python Question

Tensorflow converging but bad predictions

I posted a similar question the other day here, but I have since made edits to bugs that I found, and the problem of bad predictions remains.

I have two networks -- one with 3 conv layers and another with 3 conv layers followed by 3 deconv layers. Both take a 200x200 input image. The output is the same resolution 200x200 but it has two classifications (either a zero of a 1 -- it's a segmentation network), so the network predictions dimensions are 200x200x2 (plus batch_size). Let's talk about the network with deconv layers.

Here's the weird thing... out of 10 training runs, maybe 3 of them will converge. The other 7 will diverge down to an accuracy of 0.0.

The conv and deconv layers are activated by a ReLu. The optimizer does something weird. When I print predictions after every training iteration, the magnitude of the values start large -- which is correct considering they are all passed through ReLu's -- but after each iterations, the values get smaller until they are roughly between 0 and 2. I subsequently pass them through a sigmoid function (

sigmoid_cross_entropy_wight_logits
) -- thus squashing the large negative values to 0 and the large positive values to 1. When I make predictions, I reactivate the outputs by passing them through the sigmoid function again.

So after the first iteration, prediction values are reasonable...

Accuracy = 0.508033
[[[[ 1. 0.]
[ 0. 1.]
[ 0. 0.]
...,
[ 1. 0.]
[ 1. 1.]
[ 1. 0.]]

[[ 0. 1.]
[ 1. 1.]
[ 0. 0.]
...,
[ 1. 1.]
[ 1. 1.]
[ 0. 1.]]


but then after some iterations, and let's say it actually converges this time, the prediction values look like... (because the optimizer makes the outputs smaller, they are all in that weird middle ground of the sigmoid function)

[[ 0.51028508 0.63202268]
[ 0.24386917 0.52015287]
[ 0.62086064 0.6953823 ]
...,
[ 0.2593964 0.13163178]
[ 0.24617286 0.5210492 ]
[ 0.24692698 0.5876413 ]]]]
Accuracy = 0.999913


do I have the wrong optimizer function?

Here's the entire code... jump to
def conv_net
to see the network creation... and after that function is the definition of the cost function, optimizer, and accuracy. You'll notice when I measure accuracy and make predictions I reactivate the output with
tf.nn.sigmoid(pred)
-- this is because the cost function
sigmoid_cross_entropy_with_logits
combines the activation and the loss in the same function. In other words,
pred
(the network) outputs a linear value.

import tensorflow as tf
import pdb
import numpy as np
from numpy import genfromtxt
from PIL import Image

# Parameters
learning_rate = 0.001
training_iters = 10000
batch_size = 10
display_step = 1

# Network Parameters
n_input = 200 # MNIST data input (img shape: 28*28)
n_output = 40000
n_classes = 2 # MNIST total classes (0-9 digits)
#n_input = 200

dropout = 0.75 # Dropout, probability to keep units

# tf Graph input
x = tf.placeholder(tf.float32, [None, n_input, n_input])
y = tf.placeholder(tf.float32, [None, n_input, n_input, n_classes])
keep_prob = tf.placeholder(tf.float32) #dropout (keep probability)


def convert_to_2_channel(x, batch_size):
#assume input has dimension (batch_size,x,y)
#output will have dimension (batch_size,x,y,2)
output = np.empty((batch_size, 200, 200, 2))

temp_arr1 = np.empty((batch_size, 200, 200))
temp_arr2 = np.empty((batch_size, 200, 200))

for i in xrange(batch_size):
for j in xrange(3):
for k in xrange(3):
if x[i][j][k] == 1:
temp_arr1[i][j][k] = 1
temp_arr2[i][j][k] = 0
else:
temp_arr1[i][j][k] = 0
temp_arr2[i][j][k] = 1

for i in xrange(batch_size):
for j in xrange(200):
for k in xrange(200):
for l in xrange(2):
if l == 0:
output[i][j][k][l] = temp_arr1[i][j][k]
else:
output[i][j][k][l] = temp_arr2[i][j][k]

return output


# Create some wrappers for simplicity
def conv2d(x, W, b, strides=1):
# Conv2D wrapper, with bias and relu activation
x = tf.nn.conv2d(x, W, strides=[1, strides, strides, 1], padding='SAME')
x = tf.nn.bias_add(x, b)
return tf.nn.relu(x)

def maxpool2d(x, k=2):
# MaxPool2D wrapper
return tf.nn.max_pool(x, ksize=[1, k, k, 1], strides=[1, k, k, 1],
padding='SAME')


# Create model
def conv_net(x, weights, biases, dropout):
# Reshape input picture
x = tf.reshape(x, shape=[-1, 200, 200, 1])

# Convolution Layer
conv1 = conv2d(x, weights['wc1'], biases['bc1'])
# Max Pooling (down-sampling)
#conv1 = tf.nn.local_response_normalization(conv1)
conv1 = maxpool2d(conv1, k=2)

# Convolution Layer
conv2 = conv2d(conv1, weights['wc2'], biases['bc2'])
# Max Pooling (down-sampling)
#conv2 = tf.nn.local_response_normalization(conv2)
conv2 = maxpool2d(conv2, k=2)

# Convolution Layer
conv3 = conv2d(conv2, weights['wc3'], biases['bc3'])
# # Max Pooling (down-sampling)
#conv3 = tf.nn.local_response_normalization(conv3)
conv3 = maxpool2d(conv3, k=2)

temp_batch_size = tf.shape(x)[0]
output_shape = [temp_batch_size, 50, 50, 64]
conv4 = tf.nn.conv2d_transpose(conv3, weights['wdc1'], output_shape=output_shape, strides=[1,2,2,1], padding="VALID")
conv4 = tf.nn.bias_add(conv4, biases['bdc1'])
conv4 = tf.nn.relu(conv4)
# conv4 = tf.nn.local_response_normalization(conv4)

# output_shape = tf.pack([temp_batch_size, 100, 100, 32])
output_shape = [temp_batch_size, 100, 100, 32]
conv5 = tf.nn.conv2d_transpose(conv4, weights['wdc2'], output_shape=output_shape, strides=[1,2,2,1], padding="VALID")
conv5 = tf.nn.bias_add(conv5, biases['bdc2'])
conv5 = tf.nn.relu(conv5)
# conv5 = tf.nn.local_response_normalization(conv5)

# output_shape = tf.pack([temp_batch_size, 200, 200, 1])
output_shape = [temp_batch_size, 200, 200, 2]
conv6 = tf.nn.conv2d_transpose(conv5, weights['wdc3'], output_shape=output_shape, strides=[1,2,2,1], padding="VALID")
conv6 = tf.nn.bias_add(conv6, biases['bdc3'])
conv6 = tf.nn.relu(conv6)
# pdb.set_trace()

# Fully connected layer
# Reshape conv2 output to fit fully connected layer input
fc1 = tf.reshape(conv6, [-1, weights['wd1'].get_shape().as_list()[0]])
fc1 = tf.add(tf.matmul(fc1, weights['wd1']), biases['bd1'])
fc1 = tf.nn.relu(fc1)
# Apply Dropout
fc1 = tf.nn.dropout(fc1, dropout)

return (tf.add(tf.matmul(fc1, weights['out']), biases['out']))# Store layers weight & bias

weights = {
# 5x5 conv, 1 input, 32 outputs
'wc1' : tf.Variable(tf.random_normal([5, 5, 1, 32])),
# 5x5 conv, 32 inputs, 64 outputs
'wc2' : tf.Variable(tf.random_normal([5, 5, 32, 64])),
# 5x5 conv, 32 inputs, 64 outputs
'wc3' : tf.Variable(tf.random_normal([5, 5, 64, 128])),

'wdc1' : tf.Variable(tf.random_normal([2, 2, 64, 128])),

'wdc2' : tf.Variable(tf.random_normal([2, 2, 32, 64])),

'wdc3' : tf.Variable(tf.random_normal([2, 2, 2, 32])),

# fully connected, 7*7*64 inputs, 1024 outputs
'wd1': tf.Variable(tf.random_normal([80000, 1024])),
# 1024 inputs, 10 outputs (class prediction)
'out': tf.Variable(tf.random_normal([1024, 80000]))
}

biases = {
'bc1': tf.Variable(tf.random_normal([32])),
'bc2': tf.Variable(tf.random_normal([64])),
'bc3': tf.Variable(tf.random_normal([128])),
'bdc1': tf.Variable(tf.random_normal([64])),
'bdc2': tf.Variable(tf.random_normal([32])),
'bdc3': tf.Variable(tf.random_normal([2])),
'bd1': tf.Variable(tf.random_normal([1024])),
'out': tf.Variable(tf.random_normal([80000]))
}

# Construct model
pred = conv_net(x, weights, biases, keep_prob)
pred = tf.reshape(pred, [-1,n_input,n_input,n_classes])
# Define loss and optimizer
cost = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(pred, y))
# cost = (tf.nn.sigmoid_cross_entropy_with_logits(pred, y))
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)

# Evaluate model
correct_pred = tf.equal(0,tf.cast(tf.sub(tf.nn.sigmoid(pred),y), tf.int32))
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))

# Initializing the variables
init = tf.initialize_all_variables()
saver = tf.train.Saver()
# Launch the graph
with tf.Session() as sess:
sess.run(init)
summary = tf.train.SummaryWriter('/tmp/logdir/', sess.graph)
step = 1
from tensorflow.contrib.learn.python.learn.datasets.scroll import scroll_data
data = scroll_data.read_data('/home/kendall/Desktop/')
# Keep training until reach max iterations
while step * batch_size < training_iters:
batch_x, batch_y = data.train.next_batch(batch_size)
# Run optimization op (backprop)
batch_x = batch_x.reshape((batch_size, n_input, n_input))
batch_y = batch_y.reshape((batch_size, n_input, n_input))
batch_y = convert_to_2_channel(batch_y, batch_size) #converts the 200x200 ground truth to a 200x200x2 classification
sess.run(optimizer, feed_dict={x: batch_x, y: batch_y,
keep_prob: dropout})
#measure prediction
prediction = sess.run(tf.nn.sigmoid(pred), feed_dict={x: batch_x, keep_prob: 1.})
print prediction
if step % display_step == 0:
# Calculate batch loss and accuracdef conv_net(x, weights, biases, dropout):
save_path = "model.ckpt"
saver.save(sess, save_path)
loss, acc = sess.run([cost, accuracy], feed_dict={x: batch_x,
y: batch_y,
keep_prob: dropout})
print "Accuracy = " + str(acc)
if acc > 0.73:
break
step += 1
print "Optimization Finished!"

#make prediction
im = Image.open('/home/kendall/Desktop/HA900_frames/frame0035.tif')
batch_x = np.array(im)
# pdb.set_trace()
batch_x = batch_x.reshape((1, n_input, n_input))
batch_x = batch_x.astype(float)
pdb.set_trace()
prediction = sess.run(tf.nn.sigmoid(pred), feed_dict={x: batch_x, keep_prob: dropout})
print prediction
arr1 = np.empty((n_input,n_input))
arr2 = np.empty((n_input,n_input))
for i in xrange(n_input):
for j in xrange(n_input):
for k in xrange(2):
if k == 0:
arr1[i][j] = (prediction[0][i][j][k])
else:
arr2[i][j] = (prediction[0][i][j][k])
# prediction = np.asarray(prediction)
# prediction = np.reshape(prediction, (200,200))
# np.savetxt("prediction.csv", prediction, delimiter=",")
np.savetxt("prediction1.csv", arr1, delimiter=",")
np.savetxt("prediction2.csv", arr2, delimiter=",")
# np.savetxt("prediction2.csv", arr2, delimiter=",")

# Calculate accuracy for 256 mnist test images
print "Testing Accuracy:", \
sess.run(accuracy, feed_dict={x: data.test.images[:256],
y: data.test.labels[:256],
keep_prob: 1.})


The
correct_pred
variable (the variable that measures accuracy) is a simple subtraction operator between the predictions and the ground truth, and then compared to zero (if the two are equivalent, then the difference should be zero).

Also, I have graphed the network, and it just looks very off to me. Here is a picture, I had to crop for viewing.

image1

image2

EDIT: I found out why my graph looks terrible (thanks Olivier), and I also tried changing my loss function, but to no end -- it still diverges in the same manor

with tf.name_scope("loss") as scope:
# cost = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(pred, y))
temp_pred = tf.reshape(pred, [-1, 2])
temp_y = tf.reshape(y, [-1, 2])
cost = (tf.nn.softmax_cross_entropy_with_logits(temp_pred, temp_y))


EDIT full code now looks like this (still diverging)

import tensorflow as tf
import pdb
import numpy as np
from numpy import genfromtxt
from PIL import Image

# Parameters
learning_rate = 0.001
training_iters = 10000
batch_size = 10
display_step = 1

# Network Parameters
n_input = 200 # MNIST data input (img shape: 28*28)
n_output = 40000
n_classes = 2 # MNIST total classes (0-9 digits)
#n_input = 200

dropout = 0.75 # Dropout, probability to keep units

# tf Graph input
x = tf.placeholder(tf.float32, [None, n_input, n_input])
y = tf.placeholder(tf.float32, [None, n_input, n_input, n_classes])
keep_prob = tf.placeholder(tf.float32) #dropout (keep probability)


def convert_to_2_channel(x, batch_size):
#assume input has dimension (batch_size,x,y)
#output will have dimension (batch_size,x,y,2)
output = np.empty((batch_size, 200, 200, 2))

temp_arr1 = np.empty((batch_size, 200, 200))
temp_arr2 = np.empty((batch_size, 200, 200))

for i in xrange(batch_size):
for j in xrange(3):
for k in xrange(3):
if x[i][j][k] == 1:
temp_arr1[i][j][k] = 1
temp_arr2[i][j][k] = 0
else:
temp_arr1[i][j][k] = 0
temp_arr2[i][j][k] = 1

for i in xrange(batch_size):
for j in xrange(200):
for k in xrange(200):
for l in xrange(2):
if l == 0:
output[i][j][k][l] = temp_arr1[i][j][k]
else:
output[i][j][k][l] = temp_arr2[i][j][k]

return output


# Create some wrappers for simplicity
def conv2d(x, W, b, strides=1):
# Conv2D wrapper, with bias and relu activation
x = tf.nn.conv2d(x, W, strides=[1, strides, strides, 1], padding='SAME')
x = tf.nn.bias_add(x, b)
return tf.nn.relu(x)

def maxpool2d(x, k=2):
# MaxPool2D wrapper
return tf.nn.max_pool(x, ksize=[1, k, k, 1], strides=[1, k, k, 1],
padding='SAME')


# Create model
def conv_net(x, weights, biases, dropout):
# Reshape input picture
x = tf.reshape(x, shape=[-1, 200, 200, 1])

with tf.name_scope("conv1") as scope:
# Convolution Layer
conv1 = conv2d(x, weights['wc1'], biases['bc1'])
# Max Pooling (down-sampling)
#conv1 = tf.nn.local_response_normalization(conv1)
conv1 = maxpool2d(conv1, k=2)

# Convolution Layer
with tf.name_scope("conv2") as scope:
conv2 = conv2d(conv1, weights['wc2'], biases['bc2'])
# Max Pooling (down-sampling)
#conv2 = tf.nn.local_response_normalization(conv2)
conv2 = maxpool2d(conv2, k=2)

# Convolution Layer
with tf.name_scope("conv3") as scope:
conv3 = conv2d(conv2, weights['wc3'], biases['bc3'])
# # Max Pooling (down-sampling)
#conv3 = tf.nn.local_response_normalization(conv3)
conv3 = maxpool2d(conv3, k=2)


temp_batch_size = tf.shape(x)[0]
with tf.name_scope("deconv1") as scope:
output_shape = [temp_batch_size, 50, 50, 64]
conv4 = tf.nn.conv2d_transpose(conv3, weights['wdc1'], output_shape=output_shape, strides=[1,2,2,1], padding="VALID")
conv4 = tf.nn.bias_add(conv4, biases['bdc1'])
conv4 = tf.nn.relu(conv4)
# conv4 = tf.nn.local_response_normalization(conv4)

with tf.name_scope("deconv2") as scope:
# output_shape = tf.pack([temp_batch_size, 100, 100, 32])
output_shape = [temp_batch_size, 100, 100, 32]
conv5 = tf.nn.conv2d_transpose(conv4, weights['wdc2'], output_shape=output_shape, strides=[1,2,2,1], padding="VALID")
conv5 = tf.nn.bias_add(conv5, biases['bdc2'])
conv5 = tf.nn.relu(conv5)
# conv5 = tf.nn.local_response_normalization(conv5)

with tf.name_scope("deconv3") as scope:
# output_shape = tf.pack([temp_batch_size, 200, 200, 1])
output_shape = [temp_batch_size, 200, 200, 2]
conv6 = tf.nn.conv2d_transpose(conv5, weights['wdc3'], output_shape=output_shape, strides=[1,2,2,1], padding="VALID")
conv6 = tf.nn.bias_add(conv6, biases['bdc3'])
# conv6 = tf.nn.relu(conv6)
# pdb.set_trace()
conv6 = tf.nn.dropout(conv6, dropout)

return conv6
# Fully connected layer
# Reshape conv2 output to fit fully connected layer input
# fc1 = tf.reshape(conv6, [-1, weights['wd1'].get_shape().as_list()[0]])
# fc1 = tf.add(tf.matmul(fc1, weights['wd1']), biases['bd1'])
# fc1 = tf.nn.relu(fc1)
# # Apply Dropout
# fc1 = tf.nn.dropout(fc1, dropout)
#
# return (tf.add(tf.matmul(fc1, weights['out']), biases['out']))# Store layers weight & bias

weights = {
# 5x5 conv, 1 input, 32 outputs
'wc1' : tf.Variable(tf.random_normal([5, 5, 1, 32])),
# 5x5 conv, 32 inputs, 64 outputs
'wc2' : tf.Variable(tf.random_normal([5, 5, 32, 64])),
# 5x5 conv, 32 inputs, 64 outputs
'wc3' : tf.Variable(tf.random_normal([5, 5, 64, 128])),

'wdc1' : tf.Variable(tf.random_normal([2, 2, 64, 128])),

'wdc2' : tf.Variable(tf.random_normal([2, 2, 32, 64])),

'wdc3' : tf.Variable(tf.random_normal([2, 2, 2, 32])),

# fully connected, 7*7*64 inputs, 1024 outputs
'wd1': tf.Variable(tf.random_normal([80000, 1024])),
# 1024 inputs, 10 outputs (class prediction)
'out': tf.Variable(tf.random_normal([1024, 80000]))
}

biases = {
'bc1': tf.Variable(tf.random_normal([32])),
'bc2': tf.Variable(tf.random_normal([64])),
'bc3': tf.Variable(tf.random_normal([128])),
'bdc1': tf.Variable(tf.random_normal([64])),
'bdc2': tf.Variable(tf.random_normal([32])),
'bdc3': tf.Variable(tf.random_normal([2])),
'bd1': tf.Variable(tf.random_normal([1024])),
'out': tf.Variable(tf.random_normal([80000]))
}

# Construct model
# with tf.name_scope("net") as scope:
pred = conv_net(x, weights, biases, keep_prob)
pred = tf.reshape(pred, [-1,n_input,n_input,n_classes])
# Define loss and optimizer
with tf.name_scope("loss") as scope:
# cost = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(pred, y))
temp_pred = tf.reshape(pred, [-1, 2])
temp_y = tf.reshape(y, [-1, 2])
cost = (tf.nn.softmax_cross_entropy_with_logits(temp_pred, temp_y))

with tf.name_scope("opt") as scope:
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)
# optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate).minimize(cost)


# Evaluate model
with tf.name_scope("acc") as scope:
correct_pred = tf.equal(0,tf.cast(tf.sub(tf.nn.softmax(temp_pred),y), tf.int32))
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))

# Initializing the variables
init = tf.initialize_all_variables()
saver = tf.train.Saver()
# Launch the graph
with tf.Session() as sess:
sess.run(init)
summary = tf.train.SummaryWriter('/tmp/logdir/', sess.graph)
step = 1
from tensorflow.contrib.learn.python.learn.datasets.scroll import scroll_data
data = scroll_data.read_data('/home/kendall/Desktop/')
# Keep training until reach max iterations
while step * batch_size < training_iters:
batch_x, batch_y = data.train.next_batch(batch_size)
# Run optimization op (backprop)
batch_x = batch_x.reshape((batch_size, n_input, n_input))
batch_y = batch_y.reshape((batch_size, n_input, n_input))
batch_y = convert_to_2_channel(batch_y, batch_size) #converts the 200x200 ground truth to a 200x200x2 classification
batch_y = batch_y.reshape(batch_size * n_input * n_input, 2)
sess.run(optimizer, feed_dict={x: batch_x, temp_y: batch_y,
keep_prob: dropout})
#measure prediction
prediction = sess.run(tf.nn.softmax(temp_pred), feed_dict={x: batch_x, keep_prob: dropout})
print prediction
if step % display_step == 0:
# Calculate batch loss and accuracdef conv_net(x, weights, biases, dropout):
save_path = "model.ckpt"
saver.save(sess, save_path)
loss, acc = sess.run([cost, accuracy], feed_dict={x: batch_x,
y: batch_y,
keep_prob: dropout})
print "Accuracy = " + str(acc)
if acc > 0.73:
break
step += 1
print "Optimization Finished!"

#make prediction
im = Image.open('/home/kendall/Desktop/HA900_frames/frame0035.tif')
batch_x = np.array(im)
# pdb.set_trace()
batch_x = batch_x.reshape((1, n_input, n_input))
batch_x = batch_x.astype(float)
pdb.set_trace()
prediction = sess.run(tf.nn.sigmoid(pred), feed_dict={x: batch_x, keep_prob: dropout})
print prediction
arr1 = np.empty((n_input,n_input))
arr2 = np.empty((n_input,n_input))
for i in xrange(n_input):
for j in xrange(n_input):
for k in xrange(2):
if k == 0:
arr1[i][j] = (prediction[0][i][j][k])
else:
arr2[i][j] = (prediction[0][i][j][k])
# prediction = np.asarray(prediction)
# prediction = np.reshape(prediction, (200,200))
# np.savetxt("prediction.csv", prediction, delimiter=",")
np.savetxt("prediction1.csv", arr1, delimiter=",")
np.savetxt("prediction2.csv", arr2, delimiter=",")
# np.savetxt("prediction2.csv", arr2, delimiter=",")

# Calculate accuracy for 256 mnist test images
print "Testing Accuracy:", \
sess.run(accuracy, feed_dict={x: data.test.images[:256],
y: data.test.labels[:256],
keep_prob: 1.})

Answer

The concept of deconvolution is to output something of the same size as the input.

At the line:

conv6 = tf.nn.bias_add(conv6, biases['bdc3'])

You have this output of shape [batch_size, 200, 200, 2], so you don't need to add your fully connected layers. Just return conv6 (without the final ReLU).


If you use 2 categories in your prediction and the true labels y, you need to use tf.nn.softmax_cross_entropy_with_logits(), not the sigmoid cross entropy.

Make sure that y always has values like: y[i, j] = [0., 1.] or y[i, j] = [1., 0.]

pred = conv_net(x, weights, biases, keep_prob)  # NEW prediction conv6
pred = tf.reshape(pred, [-1, n_classes])
# Define loss and optimizer
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(pred, y))

And if you want your TensorBoard graph to look nice (or at least readable), make sure to use tf.name_scope()


EDIT:

Your accuracy is also wrong. You measure if softmax(pred) and y are equal, but softmax(pred) can never be equal to 0. or 1., so you will have an accuracy of 0..

Here is what you should do:

with tf.name_scope("acc") as scope:
    correct_pred = tf.equal(tf.argmax(temp_pred, 1), tf.argmax(temp_y, 1))
    accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))

EDIT 2:

The real error was a typo in convert_to_2_channel, in the loop

for j in xrange(3):

It should be 200 instead of 3.

Lesson: when debugging, print everything step by step with very simple examples and you will find your that the buggy function returns bad output.