I am trying to find an efficient code instead of the following piece of code (that is only one part of my code), to increase the speed:
for pr in some_list:
Tp = T[partition[pr]].sum(0)
Tpx = np.dot(Tp, xhat)
hp = h[partition[[pr]].sum(0)
up = (uk[partition[pr][:]].sum(0))/len(partition[pr])
hpu = hpu + np.dot(hp.T, up)
Tpu = Tpu + np.dot(Tp.T, up)
If your partitions are few and have many elements each, you should consider swapping around the indices of your objects. Summing an array of shape
(30,1000) along its second dimension should be faster than summing an array of shape
(1000,30) along its first dimension, since in the former case you are always summing contiguous blocks of memory (i.e.
arr[k,:] for each
k) for each remaining index. So if you put the summation index last (and get rid of some trailing singleton dimension while you're at it), you might get speed-up.
As hpaulj noted in a comment, it's not clear how your loop could be vectorized. However, since it's performance-critical, you could still try vectorizing some of the work.
I suggest that you store
Tp for each partition (following pre-allocation), then perform the scalar/matrix products in a single vectorized step. Also note that
Tpx is unused in your example, so I omitted it here (whatever you're doing with it, you can do it similarly to the other examples):
part_len = len(some_list) # number of partitions, N Tpshape = (part_len,) + T.shape[1:] # (N,30,100) if T was (1000,30,100) hpshape = (part_len,) + h.shape[1:] # (N,30,1) if h was (1000,30,1) upshape = (part_len,) + uk.shape[1:] # (N,30,1) if uk was (1000,30,1) Tp = np.zeros(Tpshape) hp = np.zeros(hpshape) up = np.zeros(upshape) for ipr,pr in enumerate(some_list): Tp[ipr,:,:] = T[partition[pr]].sum(0) hp[ipr,:,:] = h[partition[[pr]].sum(0) up[ipr,:,:] = uk[partition[pr]].sum(0)/len(partition[pr]) # compute vectorized dot products: #Tpx unclear in original, omitted # sum over second index (dot), sum over first index (sum in loop) hpu = np.einsum('abc,abd->cd',hp,up) # shape (1,1) Tpu = np.einsum('abc,abd->cd',Tp,up) # shape (100,1)
Clearly the key player is
numpy.einsum. And of course if
Tpu had some prior values before the loop, you have to increment those values with the results from