Bananach Bananach - 1 month ago 11
Python Question

Speed up double for loop in numpy

I currently have the following double loop in my Python code:

for i in range(a):
for j in range(b):

(A is a float matrix. B is a list of float matrices. C is a matrix of integers. By matrices I mean m x n np.arrays.

To be precise, the sizes are: A: mxa B: b matrices of size mxl (with l different for each matrix) C: axb. Here m is very large, a is very large, b is small, the l's are even smaller than b

I tried to speed it up by doing

for j in range(b):

but surprisingly to me this performed worse.

More precisely, this did improve performance for small values of m and a (the "large" numbers), but from m=7000,a=700 onwards the first appraoch is roughly twice as fast.

Is there anything else I can do?

Maybe I could parallelize? But I don't really know how.

(I am not committed to either Python 2 or 3)


Here's a vectorized approach assuming B as a list of arrays that are of the same shape -

# Convert B to a 3D array
B_arr = np.asarray(B)

# Use advanced indexing to index into the last axis of B array with C
# and then do product-reduction along the second axis. 
# Finally, we perform elementwise multiplication with A
A *= B_arr[np.arange(B_arr.shape[0]),:,C].prod(1).T

For cases with smaller a, we could run a loop that iterates through the length of a instead. Also, for more performance, it might be a better idea to store those elements into a separate 2D array instead and perform the elementwise multiplication only once after we get out of the loop.

Thus, we would have an alternative implementation like so -

range_arr = np.arange(B_arr.shape[0])
out = np.empty_like(A)
for i in range(a):
    out[:,i] = B_arr[range_arr,:,C[i,:]].prod(0)
A *= out