lbollar lbollar - 3 months ago 28
Python Question

conv2d_transpose is dependent on batch_size when making predictions

I have a neural network currently implemented in tensorflow, but I am having a problem making predictions after training, because I have a conv2d_transpose operations, and the shapes of these ops are dependent on the batch size. I have a layer that requires output_shape as an argument:

def deconvLayer(input, filter_shape, output_shape, strides):
W1_1 = weight_variable(filter_shape)

output = tf.nn.conv2d_transpose(input, W1_1, output_shape, strides, padding="SAME")

return output


That is actually used in a larger model I have constructed like the following:

conv3 = layers.convLayer(conv2['layer_output'], [3, 3, 64, 128], use_pool=False)

conv4 = layers.deconvLayer(conv3['layer_output'],
filter_shape=[2, 2, 64, 128],
output_shape=[batch_size, 32, 40, 64],
strides=[1, 2, 2, 1])


The problem is, if I go to make a prediction using the trained model, my test data has to have the same batch size, or else I get the following error.

tensorflow.python.framework.errors.InvalidArgumentError: Conv2DBackpropInput: input and out_backprop must have the same batch size


Is there some way that I can get a prediction for an input with variable batch size? When I look at the trained weights, nothing seems to depend on batch size, so I can't see why this would be a problem.

Answer

So I came across a solution based on the issues forum of tensorflow at https://github.com/tensorflow/tensorflow/issues/833.

In my code

 conv4 = layers.deconvLayer(conv3['layer_output'],
                                    filter_shape=[2, 2, 64, 128],
                                    output_shape=[batch_size, 32, 40, 64],
                                    strides=[1, 2, 2, 1])

my output shape that get passed to deconvLayer was hard coded with a predetermined batch shape when training. By altering this to the following:

def deconvLayer(input, filter_shape, output_shape, strides):
    W1_1 = weight_variable(filter_shape)

    dyn_input_shape = tf.shape(input)
    batch_size = dyn_input_shape[0]

    output_shape = tf.pack([batch_size, output_shape[1], output_shape[2], output_shape[3]])

    output = tf.nn.conv2d_transpose(input, W1_1, output_shape, strides, padding="SAME")

    return output

This allows the shape to be dynamically inferred at run time and can handle a variable batch size.

Running the code, I no longer receive this error when passing in any batch size of test data. I believe this is necessary due to the fact that the inference of shapes for transpose ops is not as straightforward at the moment as it is for normal convolutional ops. So where we would usually use None for the batch_size in normal convolutional ops, we must provide a shape, and since this could vary based on input, we must go through the effort of dynamically determining it.