Bob.S.P - 1 year ago 83

Python Question

I am trying to find an efficient code instead of the following piece of code (that is only one part of my code), to increase the speed:

`for pr in some_list:`

Tp = T[partition[pr]].sum(0)

Tpx = np.dot(Tp, xhat)

hp = h[partition[[pr]].sum(0)

up = (uk[partition[pr][:]].sum(0))/len(partition[pr])

hpu = hpu + np.dot(hp.T, up)

Tpu = Tpu + np.dot(Tp.T, up)

I have at least two more similar blocks of code. As you can see, I used fancy indexing three times (really couldn't find another way). In my algorithm, I need this part to be done very quickly, but it's not happening now. I will really appreciate any suggestion.

Thank you all.

Best,

Answer Source

If your partitions are few and have many elements each, you should consider swapping around the indices of your objects. Summing an array of shape `(30,1000)`

along its second dimension should be faster than summing an array of shape `(1000,30)`

along its first dimension, since in the former case you are always summing contiguous blocks of memory (i.e. `arr[k,:]`

for each `k`

) for each remaining index. So if you put the summation index last (and get rid of some trailing singleton dimension while you're at it), you might get speed-up.

As hpaulj noted in a comment, it's not clear how your loop could be vectorized. However, since it's performance-critical, you could still try vectorizing *some* of the work.

I suggest that you store `hp`

, `up`

and `Tp`

for each partition (following pre-allocation), then perform the scalar/matrix products in a single vectorized step. Also note that `Tpx`

is unused in your example, so I omitted it here (whatever you're doing with it, you can do it similarly to the other examples):

```
part_len = len(some_list) # number of partitions, N
Tpshape = (part_len,) + T.shape[1:] # (N,30,100) if T was (1000,30,100)
hpshape = (part_len,) + h.shape[1:] # (N,30,1) if h was (1000,30,1)
upshape = (part_len,) + uk.shape[1:] # (N,30,1) if uk was (1000,30,1)
Tp = np.zeros(Tpshape)
hp = np.zeros(hpshape)
up = np.zeros(upshape)
for ipr,pr in enumerate(some_list):
Tp[ipr,:,:] = T[partition[pr]].sum(0)
hp[ipr,:,:] = h[partition[[pr]].sum(0)
up[ipr,:,:] = uk[partition[pr]].sum(0)/len(partition[pr])
# compute vectorized dot products:
#Tpx unclear in original, omitted
# sum over second index (dot), sum over first index (sum in loop)
hpu = np.einsum('abc,abd->cd',hp,up) # shape (1,1)
Tpu = np.einsum('abc,abd->cd',Tp,up) # shape (100,1)
```

Clearly the key player is `numpy.einsum`

. And of course if `hpu`

and `Tpu`

had some prior values before the loop, you have to increment those values with the results from `einsum`

above.