Shiva - 9 months ago 51

Python Question

I'd like to discuss a little bit on convolution as applied to CNNs and image filtering... If you have an RGB image (dimensions of say

`3xIxI`

`K`

`3xFxF`

`Kx(I - F + 1)x(I - F + 1)`

`1`

From all the material I've read on convolution, you're basically sliding each filter over the image, and at each stage computing a large number of dot products and then summing them up to get a single value.

For example:

`I -> 3x5x5 matrix`

F -> 3x2x2 matrix

I * F -> 1x4x4 matrix

(Assume

`*`

Now, since both your kernel and image have the same number of channels, you are going to end up separating your 3D convolution into a number of parallel 2D convolutions, followed by a matrix summation.

Therefore, the above example should for all intents and purposes (assuming there is no padding and we are only considering completely overlapping regions) be the same as this:

`I -> 3x5x5 matrix`

F -> 3x2x2 matrix

(I[0] * F[0]) + (I[1] * F[1]) + (I[2] * F[2]) -> 1x4x4 matrix

I am just separating each channel and convolving them independently. Please, look at this carefully and correct me if I'm wrong.

Now, on the assumption that this makes sense, I've carried out the following experiment in python.

`import scipy.signal`

import numpy as np

import test

x = np.random.randint(0, 10, (3, 5, 5)).astype(np.float32)

w = np.random.randint(0, 10, (3, 2, 2)).astype(np.float32)

r1 = np.sum([scipy.signal.convolve(x[i], w[i], 'valid') for i in range(3)], axis=0).reshape(1, 4, 4)

r2 = scipy.signal.convolve(x, w, 'valid')

print r1.shape

print r1

print r2.shape

print r2

This gives me the following result:

`(1, 4, 4)`

[[[ 268. 229. 297. 305.]

[ 256. 292. 322. 190.]

[ 173. 240. 283. 243.]

[ 291. 271. 302. 346.]]]

(1, 4, 4)

[[[ 247. 229. 291. 263.]

[ 198. 297. 342. 233.]

[ 208. 268. 268. 185.]

[ 276. 272. 280. 372.]]]

I'd just like to know whether this is due to:

- A bug in scipy (less likely)
- A mistake in my program (more likely)
- My misunderstanding of overlapping convolution (most likely)

Or any combination of the above. Thanks for reading!

Answer

You wrote:

... the same as this:

```
I -> 3x5x5 matrix
F -> 3x2x2 matrix
(I[0] * F[0]) + (I[1] * F[1]) + (I[2] * F[2]) -> 1x4x4 matrix
```

You have forgotten that convolution *reverses* one of the arguments. So the above is not true. Instead, the last line should be:

```
(I[0] * F[2]) + (I[1] * F[1]) + (I[2] * F[0]) -> 1x4x4 matrix
```

For example,

```
In [28]: r1 = np.sum([scipy.signal.convolve(x[i], w[2-i], 'valid') for i in range(3)], axis=0).reshape(1, 4, 4)
In [29]: r2 = scipy.signal.convolve(x, w, 'valid')
In [30]: r1
Out[30]:
array([[[ 169., 223., 277., 199.],
[ 226., 213., 206., 247.],
[ 192., 252., 332., 369.],
[ 167., 266., 321., 323.]]], dtype=float32)
In [31]: r2
Out[31]:
array([[[ 169., 223., 277., 199.],
[ 226., 213., 206., 247.],
[ 192., 252., 332., 369.],
[ 167., 266., 321., 323.]]], dtype=float32)
```