Bananach - 2 months ago 27

Python Question

I currently have the following double loop in my Python code:

`for i in range(a):`

for j in range(b):

A[:,i]*=B[j][:,C[i,j]]

(A is a float matrix. B is a list of float matrices. C is a matrix of integers. By matrices I mean m x n np.arrays.

To be precise, the sizes are: A: mxa B: b matrices of size mxl (with l different for each matrix) C: axb. Here m is very large, a is very large, b is small, the l's are even smaller than b

)

I tried to speed it up by doing

`for j in range(b):`

A[:,:]*=B[j][:,C[:,j]]

but surprisingly to me this performed worse.

More precisely, this did improve performance for small values of m and a (the "large" numbers), but from m=7000,a=700 onwards the first appraoch is roughly twice as fast.

Is there anything else I can do?

Maybe I could parallelize? But I don't really know how.

(I am not committed to either Python 2 or 3)

Answer

Here's a vectorized approach assuming `B`

as a list of arrays that are of the same shape -

```
# Convert B to a 3D array
B_arr = np.asarray(B)
# Use advanced indexing to index into the last axis of B array with C
# and then do product-reduction along the second axis.
# Finally, we perform elementwise multiplication with A
A *= B_arr[np.arange(B_arr.shape[0]),:,C].prod(1).T
```

For cases with smaller `a`

, we could run a loop that iterates through the length of `a`

instead. Also, for more performance, it might be a better idea to store those elements into a separate 2D array instead and perform the elementwise multiplication only once after we get out of the loop.

Thus, we would have an alternative implementation like so -

```
range_arr = np.arange(B_arr.shape[0])
out = np.empty_like(A)
for i in range(a):
out[:,i] = B_arr[range_arr,:,C[i,:]].prod(0)
A *= out
```