xolodec - 1 year ago 88
Python Question

# Fast sequential addition to rectangular subarray in numpy

I have come across a problem which is to rewrite a piece of code in vectorized form. The code shown below is a simplified illustration of initial problem

``````K = 20
h, w = 15, 20
H, W = 1000-h, 2000-w
q = np.random.randint(0, 20, size=(H, W, K))  # random just for illustration
Q = np.zeros((H+h, W+w, K))
for n in range(H):
for m in range(W):
Q[n:n+h, m:m+w, :] += q[n, m, :]
``````

This code takes long to execute and it seems to me it is rather simple to allow vectorized implementation.

I am aware of numpy's
`s_`
function which allows to construct slices which in turn can help in code vectorizing. But because every single element in
`Q`
is the result of multiple subsequent additions of elements from
`q`
I found it difficult to proceed in that simple way.

I guess that
`np.add.at`
could be useful to cope with sequential addition. But i have spent much time trying to make this two functions work for me and decided to ask for help because I constantly get an

``````IndexError: failed to coerce slice entry of type numpy.ndarray to integer
``````

for any attempt i make.

Maybe there is some another numpy's magic which I am unaware of and which could help me in my task but it seems extremely difficult to google for it.

Well you are basically summing across sliding windows along the first and second axes, which in signal processing domain is termed as `convolution`. For two axes that would be `2D convolution`. Now, Scipy has it implemented as `convolve2d` and could be used for each slice along the third axis.

Thus, we would have an implementation with it, like so -

``````from scipy.signal import convolve2d

kernel = np.ones((h,w),dtype=int)
m,n,r = q.shape[0]+h-1, q.shape[1]+w-1, q.shape[2]
out = np.empty((m,n,r),dtype=q.dtype)
for i in range(r):
out[...,i] = convolve2d(q[...,i],kernel)
``````

As it turns out, we can use `fftconvolve` from the same repo that allows us to work with higher-dimensional arrays. This would get us the output in a fully vectorized way, like so -

``````from scipy.signal import fftconvolve

out = fftconvolve(q,np.ones((h,w,1),dtype=int))
``````

Runtime test

Function definitions -

``````def original_app(q,K,h,w,H,W):
Q = np.zeros((H+h-1, W+w-1, K))
for n in range(H):
for m in range(W):
Q[n:n+h, m:m+w, :] += q[n, m, :]
return Q

def convolve2d_app(q,K,h,w,H,W):
kernel = np.ones((h,w),dtype=int)
m,n,r = q.shape[0]+h-1, q.shape[1]+w-1, q.shape[2]
out = np.empty((m,n,r),dtype=q.dtype)
for i in range(r):
out[...,i] = convolve2d(q[...,i],kernel)
return out

def fftconvolve_app(q,K,h,w,H,W):
return fftconvolve(q,np.ones((h,w,1),dtype=int))
``````

Timings and verification -

``````In [128]: # Setup inputs
...: K = 20
...: h, w = 15, 20
...: H, W = 200-h, 400-w
...: q = np.random.randint(0, 20, size=(H, W, K))
...:

In [129]: %timeit original_app(q,K,h,w,H,W)
1 loops, best of 3: 2.05 s per loop

In [130]: %timeit convolve2d_app(q,K,h,w,H,W)
1 loops, best of 3: 2.05 s per loop

In [131]: %timeit fftconvolve_app(q,K,h,w,H,W)
1 loops, best of 3: 233 ms per loop

In [132]: np.allclose(original_app(q,K,h,w,H,W),convolve2d_app(q,K,h,w,H,W))
Out[132]: True

In [133]: np.allclose(original_app(q,K,h,w,H,W),fftconvolve_app(q,K,h,w,H,W))
Out[133]: True
``````

So, it seems `fftconvolve` based approach is doing really well there!

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download