This question exists as a github issue , too.
I would like to build a neural network in Keras which contains both 2D convolutions and an LSTM layer.
The network should classify MNIST.
The training data in MNIST are 60000 grey-scale images of handwritten digits from 0 to 9. Each image is 28x28 pixels.
I've splitted the images into four parts (left/right, up/down) and rearranged them in four orders to get sequences for the LSTM.
| | |1 | 2|
|image| -> ------- -> 4 sequences: |1|2|3|4|, |4|3|2|1|, |1|3|2|4|, |4|2|3|1|
| | |3 | 4|
One of the small sub-images has the dimension 14 x 14. The four sequences are stacked together along the width (shouldn't matter whether width or height).
This creates a vector with the shape [60000, 4, 1, 56, 14] where:
- 60000 is the number of samples
- 4 is the number of elements in a sequence (# of timesteps)
- 1 is the depth of colors (greyscale)
- 56 and 14 are width and height
Now this should be given to a Keras model.
The problem is to change the input dimensions between the CNN and the LSTM.
I searched online and found this question: Python keras how to change the size of input after convolution layer into lstm layer
The solution seems to be a Reshape layer which flattens the image but retains the timesteps (as opposed to a Flatten layer which would collapse everything but the batch_size).
Here's my code so far:
model.add(Convolution2D(nb_filters, kernel_size, kernel_size,
model.add(Convolution2D(nb_filters, kernel_size, kernel_size))
This code creates an error message:
ValueError: total size of new array must be unchanged
Apparently the input to the Reshape layer is incorrect. As an alternative, I tried to pass the timesteps to the Reshape layer, too:
This doesn't feel right and in any case, the error stays the same.
Am I doing this the right way ?
Is a Reshape layer the proper tool to connect CNN and LSTM ?
There are rather complex approaches to this problem.
Such as this:
A TimeDistributed Layer which seems to hide the timestep dimension from following layers.
Or this: https://github.com/anayebi/keras-extra
A set of special layers for combining CNNs and LSTMs.
Why are there so complicated (at least they seem complicated to me) solutions, if a simple Reshape does the trick ?
Embarrassingly, I forgot that the dimensions will be changed by the pooling and (for lack of padding) the convolutions, too.
advised me to use
to check the dimensions.
The output of the layer before the Reshape layer is
(None, 32, 26, 5)
I changed the reshape to:
Now the ValueError is gone, instead the LSTM complains:
Exception: Input 0 is incompatible with layer lstm_5: expected ndim=3, found ndim=2
It seems like I need to pass the timestep dimension through the entire network. How can I do that ? If I add it to the input_shape of the Convolution, it complains, too:
Convolution2D(nb_filters, kernel_size, kernel_size, border_mode="valid", input_shape=[4, 1, 56,14])
Exception: Input 0 is incompatible with layer convolution2d_44: expected ndim=4, found ndim=5