Kendall Weihe Kendall Weihe - 6 months ago 180
Python Question

Tensorflow LSTM pixel-wise classification

How would one do pixel-wise classification for LSTM networks? Specifically, in Tensorflow.

My intuition tells me that the output tensors (

pred
&
y
from the code) should be a 2-dimensional tensor with the same resolution as the input image. In other words, the input image would be 200x200 and the output classification would be 200x200.

The Udacity course includes an example LSTM network where the input image is 28x28. However it is an image (as a whole -- hand writing MNIST dataset) classification network.

My thinking was that I could replace all tensors with dimensions
[n_classes]
with
[n_input][n_steps]
(code below). However it throws an error at a matrix multiplication.

The Udacity example code looks partially like this:

n_input = 28 # MNIST data input (img shape: 28*28)
n_steps = 28 # timesteps
n_hidden = 128 # hidden layer num of features
n_classes = 10 # MNIST total classes (0-9 digits)

# tf Graph input
x = tf.placeholder("float", [None, n_steps, n_input])
y = tf.placeholder("float", [None, n_classes])

# Define weights
weights = {
'hidden': tf.Variable(tf.random_normal([n_input, n_hidden])),
'out': tf.Variable(tf.random_normal([n_hidden, n_classes]))
}
biases = {
'hidden': tf.Variable(tf.random_normal([n_hidden])),
'out': tf.Variable(tf.random_normal([n_classes]))
}


def RNN(x, weights, biases):

# Prepare data shape to match `rnn` function requirements
# Current data input shape: (batch_size, n_steps, n_input)
# Permuting batch_size and n_steps
x = tf.transpose(x, [1, 0, 2])
# Reshaping to (n_steps*batch_size, n_input)
x = tf.reshape(x, [-1, n_input])
# Split to get a list of 'n_steps' tensors of shape (batch_size, n_hidden)
# This input shape is required by `rnn` function
x = tf.split(0, n_steps, x)

# Define a lstm cell with tensorflow
lstm_cell = rnn_cell.BasicLSTMCell(n_hidden, forget_bias=1.0)
pdb.set_trace()
# Get lstm cell output
outputs, states = rnn.rnn(lstm_cell, x, dtype=tf.float32)

# Linear activation, using rnn inner loop last output
return tf.matmul(outputs[-1], weights['out']) + biases['out']

pred = RNN(x, weights, biases)


-----------------------------------------------------------------------------



And then my code looks like this:

n_input = 200 # data data input (img shape: 28*28)
n_steps = 200 # timesteps
n_hidden = 128 # hidden layer num of features
n_classes = 2 # data total classes (0-9 digits)

# tf Graph input
x = tf.placeholder("float", [None, n_input, n_steps])
y = tf.placeholder("float", [None, n_input, n_steps])


# Define weights
weights = {
'hidden': tf.Variable(tf.random_normal([n_input, n_hidden]), dtype="float32"),
'out': tf.Variable(tf.random_normal([n_hidden, n_input, n_steps]), dtype="float32")
}
biases = {
'hidden': tf.Variable(tf.random_normal([n_hidden]), dtype="float32"),
'out': tf.Variable(tf.random_normal([n_input, n_steps]), dtype="float32")
}


def RNN(x, weights, biases):

# Prepare data shape to match `rnn` function requirements
# Current data input shape: (batch_size, n_steps, n_input)
# Permuting batch_size and n_steps
pdb.set_trace()
x = tf.transpose(x, [1, 0, 2])
# Reshaping to (n_steps*batch_size, n_input)
x = tf.reshape(x, [-1, n_input])
# Split to get a list of 'n_steps' tensors of shape (batch_size, n_hidden)
# This input shape is required by `rnn` function
x = tf.split(0, n_steps, x)

# Define a lstm cell with tensorflow
lstm_cell = rnn_cell.BasicLSTMCell(n_hidden, forget_bias=1.0)
pdb.set_trace()

# Get lstm cell output
outputs, states = rnn.rnn(lstm_cell, x, dtype=tf.float32)

# Linear activation, using rnn inner loop last output
# return tf.matmul(outputs[-1], weights['out']) + biases['out']
return tf.batch_matmul(outputs[-1], weights['out']) + biases['out']

pred = RNN(x, weights, biases)


The line
return tf.batch_matmul(outputs[-1], weights['out']) + biases['out']
is where the problem is. Because
outputs
is a vector of 2D tensors and
weights['out']
is a vector of 3D tensors.

I thought maybe I could change the dimensions of
outputs
however that will require diving deep into the RNN object (in the API).

What are my options here? Could I do some reshaping? If so, what should I reshape and in what way?

Answer

You cannot do matrix multiplication with a matrix of shape [n_hidden, n_input, n_step] of dimension 3.
What you can do is output a vector of dimension [batch_size, n_input * n_step] and then reshape it back to [batch_size, n_input, n_step].

weights = {
    'hidden': ... ,
    'out': tf.Variable(tf.random_normal([n_hidden, n_input * n_steps]), dtype="float32")
}
biases = {
    'hidden': ... ,
    'out': tf.Variable(tf.random_normal([n_input * n_steps]), dtype="float32")
}
# ...

pred = RNN(x, weights, biases)
pred = tf.reshape(pred, [-1, n_input, n_steps])

On your model

However, what you do here is an RNN over every column of the image. You are trying to take every slice of the image (200 in total) and iterates through it, which will not give good results at all.

If you want to work on images, I suggest you take a look at this tutorial from TensorFlow where you can learn to use convolutions, much more effective than RNN on images.