xolodec xolodec - 1 month ago 11
Python Question

Fast sequential addition to rectangular subarray in numpy

I have come across a problem which is to rewrite a piece of code in vectorized form. The code shown below is a simplified illustration of initial problem

K = 20
h, w = 15, 20
H, W = 1000-h, 2000-w
q = np.random.randint(0, 20, size=(H, W, K)) # random just for illustration
Q = np.zeros((H+h, W+w, K))
for n in range(H):
for m in range(W):
Q[n:n+h, m:m+w, :] += q[n, m, :]


This code takes long to execute and it seems to me it is rather simple to allow vectorized implementation.

I am aware of numpy's
s_
function which allows to construct slices which in turn can help in code vectorizing. But because every single element in
Q
is the result of multiple subsequent additions of elements from
q
I found it difficult to proceed in that simple way.

I guess that
np.add.at
could be useful to cope with sequential addition. But i have spent much time trying to make this two functions work for me and decided to ask for help because I constantly get an

IndexError: failed to coerce slice entry of type numpy.ndarray to integer


for any attempt i make.

Maybe there is some another numpy's magic which I am unaware of and which could help me in my task but it seems extremely difficult to google for it.

Answer

Well you are basically summing across sliding windows along the first and second axes, which in signal processing domain is termed as convolution. For two axes that would be 2D convolution. Now, Scipy has it implemented as convolve2d and could be used for each slice along the third axis.

Thus, we would have an implementation with it, like so -

from scipy.signal import convolve2d

kernel = np.ones((h,w),dtype=int)
m,n,r = q.shape[0]+h-1, q.shape[1]+w-1, q.shape[2]
out = np.empty((m,n,r),dtype=q.dtype)
for i in range(r):
    out[...,i] = convolve2d(q[...,i],kernel)

As it turns out, we can use fftconvolve from the same repo that allows us to work with higher-dimensional arrays. This would get us the output in a fully vectorized way, like so -

from scipy.signal import fftconvolve

out = fftconvolve(q,np.ones((h,w,1),dtype=int))

Runtime test

Function definitions -

def original_app(q,K,h,w,H,W):
    Q = np.zeros((H+h-1, W+w-1, K))
    for n in range(H):
        for m in range(W):
            Q[n:n+h, m:m+w, :] += q[n, m, :]
    return Q        

def convolve2d_app(q,K,h,w,H,W):
    kernel = np.ones((h,w),dtype=int)
    m,n,r = q.shape[0]+h-1, q.shape[1]+w-1, q.shape[2]
    out = np.empty((m,n,r),dtype=q.dtype)
    for i in range(r):
        out[...,i] = convolve2d(q[...,i],kernel)
    return out

def fftconvolve_app(q,K,h,w,H,W):
    return fftconvolve(q,np.ones((h,w,1),dtype=int))

Timings and verification -

In [128]: # Setup inputs
     ...: K = 20
     ...: h, w = 15, 20 
     ...: H, W = 200-h, 400-w
     ...: q = np.random.randint(0, 20, size=(H, W, K))
     ...: 

In [129]: %timeit original_app(q,K,h,w,H,W)
1 loops, best of 3: 2.05 s per loop

In [130]: %timeit convolve2d_app(q,K,h,w,H,W)
1 loops, best of 3: 2.05 s per loop

In [131]: %timeit fftconvolve_app(q,K,h,w,H,W)
1 loops, best of 3: 233 ms per loop

In [132]: np.allclose(original_app(q,K,h,w,H,W),convolve2d_app(q,K,h,w,H,W))
Out[132]: True

In [133]: np.allclose(original_app(q,K,h,w,H,W),fftconvolve_app(q,K,h,w,H,W))
Out[133]: True

So, it seems fftconvolve based approach is doing really well there!