Kendall Weihe - 1 year ago 418

Python Question

How would one do pixel-wise classification for LSTM networks? Specifically, in Tensorflow.

My intuition tells me that the output tensors (

`pred`

`y`

The Udacity course includes an example LSTM network where the input image is 28x28. However it is an image (as a whole -- hand writing MNIST dataset) classification network.

My thinking was that I could replace all tensors with dimensions

`[n_classes]`

`[n_input][n_steps]`

The Udacity example code looks partially like this:

`n_input = 28 # MNIST data input (img shape: 28*28)`

n_steps = 28 # timesteps

n_hidden = 128 # hidden layer num of features

n_classes = 10 # MNIST total classes (0-9 digits)

# tf Graph input

x = tf.placeholder("float", [None, n_steps, n_input])

y = tf.placeholder("float", [None, n_classes])

# Define weights

weights = {

'hidden': tf.Variable(tf.random_normal([n_input, n_hidden])),

'out': tf.Variable(tf.random_normal([n_hidden, n_classes]))

}

biases = {

'hidden': tf.Variable(tf.random_normal([n_hidden])),

'out': tf.Variable(tf.random_normal([n_classes]))

}

def RNN(x, weights, biases):

# Prepare data shape to match `rnn` function requirements

# Current data input shape: (batch_size, n_steps, n_input)

# Permuting batch_size and n_steps

x = tf.transpose(x, [1, 0, 2])

# Reshaping to (n_steps*batch_size, n_input)

x = tf.reshape(x, [-1, n_input])

# Split to get a list of 'n_steps' tensors of shape (batch_size, n_hidden)

# This input shape is required by `rnn` function

x = tf.split(0, n_steps, x)

# Define a lstm cell with tensorflow

lstm_cell = rnn_cell.BasicLSTMCell(n_hidden, forget_bias=1.0)

pdb.set_trace()

# Get lstm cell output

outputs, states = rnn.rnn(lstm_cell, x, dtype=tf.float32)

# Linear activation, using rnn inner loop last output

return tf.matmul(outputs[-1], weights['out']) + biases['out']

pred = RNN(x, weights, biases)

And then my code looks like this:

`n_input = 200 # data data input (img shape: 28*28)`

n_steps = 200 # timesteps

n_hidden = 128 # hidden layer num of features

n_classes = 2 # data total classes (0-9 digits)

# tf Graph input

x = tf.placeholder("float", [None, n_input, n_steps])

y = tf.placeholder("float", [None, n_input, n_steps])

# Define weights

weights = {

'hidden': tf.Variable(tf.random_normal([n_input, n_hidden]), dtype="float32"),

'out': tf.Variable(tf.random_normal([n_hidden, n_input, n_steps]), dtype="float32")

}

biases = {

'hidden': tf.Variable(tf.random_normal([n_hidden]), dtype="float32"),

'out': tf.Variable(tf.random_normal([n_input, n_steps]), dtype="float32")

}

def RNN(x, weights, biases):

# Prepare data shape to match `rnn` function requirements

# Current data input shape: (batch_size, n_steps, n_input)

# Permuting batch_size and n_steps

pdb.set_trace()

x = tf.transpose(x, [1, 0, 2])

# Reshaping to (n_steps*batch_size, n_input)

x = tf.reshape(x, [-1, n_input])

# Split to get a list of 'n_steps' tensors of shape (batch_size, n_hidden)

# This input shape is required by `rnn` function

x = tf.split(0, n_steps, x)

# Define a lstm cell with tensorflow

lstm_cell = rnn_cell.BasicLSTMCell(n_hidden, forget_bias=1.0)

pdb.set_trace()

# Get lstm cell output

outputs, states = rnn.rnn(lstm_cell, x, dtype=tf.float32)

# Linear activation, using rnn inner loop last output

# return tf.matmul(outputs[-1], weights['out']) + biases['out']

return tf.batch_matmul(outputs[-1], weights['out']) + biases['out']

pred = RNN(x, weights, biases)

The line

`return tf.batch_matmul(outputs[-1], weights['out']) + biases['out']`

`outputs`

`weights['out']`

I thought maybe I could change the dimensions of

`outputs`

What are my options here? Could I do some reshaping? If so, what should I reshape and in what way?

Answer Source

You cannot do matrix multiplication with a matrix of shape `[n_hidden, n_input, n_step]`

of dimension 3.

What you can do is output a vector of dimension `[batch_size, n_input * n_step]`

and then reshape it back to `[batch_size, n_input, n_step]`

.

```
weights = {
'hidden': ... ,
'out': tf.Variable(tf.random_normal([n_hidden, n_input * n_steps]), dtype="float32")
}
biases = {
'hidden': ... ,
'out': tf.Variable(tf.random_normal([n_input * n_steps]), dtype="float32")
}
# ...
pred = RNN(x, weights, biases)
pred = tf.reshape(pred, [-1, n_input, n_steps])
```

However, what you do here is an RNN over every column of the image. You are trying to take every slice of the image (200 in total) and iterates through it, which will not give good results at all.

If you want to work on images, I suggest you take a look at this tutorial from TensorFlow where you can learn to use **convolutions**, much more effective than RNN on images.