Tingiskhan Tingiskhan - 19 days ago 5
Python Question

Vectorized searchsorted numpy

Assume that I have two arrays

A
and
B
, where both
A
and
B
are
m x n
. My goal is now, for each row of
A
and
B
, to find where I should insert the elements of row
i
of
A
in the corresponding row of
B
. That is, I wish to apply
np.digitize
or
np.searchsorted
to each row of
A
and
B
.

My naive solution is to simply iterate over the rows. However, this is far too slow for my application. My question is therefore: is there a vectorized implementation of either algorithm that I haven't managed to find?

Answer

We can add each row some offset as compared to the previous row. We would use the same offset for both arrays. The idea is to use np.searchsorted on flattened version of input arrays thereafter and thus each row from b would be restricted to find sorted positions in the corresponding row in a.

So, we would have a vectorized implementation like so -

def searchsorted2d(a,b):
    m,n = a.shape
    max_num = np.maximum(a.max(), b.max()) + 1
    r = max_num*np.arange(a.shape[0])[:,None]
    p = np.searchsorted( (a+r).ravel(), (b+r).ravel() ).reshape(m,-1)
    return p - n*(np.arange(m)[:,None])

Please note that this works on non-negative numbers. To make it work for negative numbers too, we just need to offset for the minimum numbers as well.

Runtime test -

In [173]: def searchsorted2d_loopy(a,b):
     ...:     out = np.zeros(a.shape,dtype=int)
     ...:     for i in range(len(a)):
     ...:         out[i] = np.searchsorted(a[i],b[i])
     ...:     return out
     ...: 

In [174]: # Setup input arrays
     ...: a = np.random.randint(11,99,(10000,20))
     ...: b = np.random.randint(11,99,(10000,20))
     ...: a = np.sort(a,1)
     ...: b = np.sort(b,1)
     ...: 

In [175]: np.allclose(searchsorted2d(a,b),searchsorted2d_loopy(a,b))
Out[175]: True

In [176]: %timeit searchsorted2d_loopy(a,b)
10 loops, best of 3: 28.6 ms per loop

In [177]: %timeit searchsorted2d(a,b)
100 loops, best of 3: 13.7 ms per loop
Comments