muon muon - 1 year ago 138
Python Question

sum groups rows of numpy matrix using list of lists of indices

slice numpy array using lists of indices and apply function, is it possible to vectorize (or nonvectorized way to do this)? vectorized would be ideal for large matrices

import numpy as np
index = [[1,3], [2,4,5]]
a = np.array(
[[ 3, 4, 6, 3],
[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[12, 13, 14, 15],
[1, 1, 4, 5]])


summing by the groups of row indices in
index
, giving:

np.array([[8, 10, 12, 14],
[17, 19, 24, 37]])

Answer Source

Approach #1 : Here's an almost* vectorized approach -

def sumrowsby_index(a, index):
    index_arr = np.concatenate(index)
    lens = np.array([len(i) for i in index])
    cut_idx = np.concatenate(([0], lens[:-1].cumsum() ))
    return np.add.reduceat(a[index_arr], cut_idx)

*Almost because of the step that computes lens with a loop-comprehension, but since we are simply getting the lengths and no computation is involved there, that step won't sway the timings in any big way.

Sample run -

In [716]: a
Out[716]: 
array([[ 3,  4,  6,  3],
       [ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15],
       [ 1,  1,  4,  5]])

In [717]: index
Out[717]: [[1, 3], [2, 4, 5]]

In [718]: sumrowsby_index(a, index)
Out[718]: 
array([[ 8, 10, 12, 14],
       [17, 19, 24, 27]])

Approach #2 : We could leverage fast matrix-multiplication with numpy.dot to perform those sum-reductions, giving us another method as listed below -

def sumrowsby_index_v2(a, index):
    lens = np.array([len(i) for i in index])
    id_ar = np.zeros((len(lens), a.shape[0]))
    c = np.concatenate(index)
    r = np.repeat(np.arange(len(index)), lens)    
    id_ar[r,c] = 1
    return id_ar.dot(a)
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download