Tingiskhan - 1 year ago 122

Python Question

Assume that I have two arrays

`A`

`B`

`A`

`B`

`m x n`

`A`

`B`

`i`

`A`

`B`

`np.digitize`

`np.searchsorted`

`A`

`B`

My naive solution is to simply iterate over the rows. However, this is far too slow for my application. My question is therefore: is there a vectorized implementation of either algorithm that I haven't managed to find?

Answer Source

We can add each row some offset as compared to the previous row. We would use the same offset for both arrays. The idea is to use `np.searchsorted`

on flattened version of input arrays thereafter and thus each row from `b`

would be restricted to find sorted positions in the corresponding row in `a`

.

So, we would have a vectorized implementation like so -

```
def searchsorted2d(a,b):
m,n = a.shape
max_num = np.maximum(a.max(), b.max()) + 1
r = max_num*np.arange(a.shape[0])[:,None]
p = np.searchsorted( (a+r).ravel(), (b+r).ravel() ).reshape(m,-1)
return p - n*(np.arange(m)[:,None])
```

Please note that this works on `non-negative numbers`

. To make it work for negative numbers too, we just need to offset for the minimum numbers as well.

Runtime test -

```
In [173]: def searchsorted2d_loopy(a,b):
...: out = np.zeros(a.shape,dtype=int)
...: for i in range(len(a)):
...: out[i] = np.searchsorted(a[i],b[i])
...: return out
...:
In [174]: # Setup input arrays
...: a = np.random.randint(11,99,(10000,20))
...: b = np.random.randint(11,99,(10000,20))
...: a = np.sort(a,1)
...: b = np.sort(b,1)
...:
In [175]: np.allclose(searchsorted2d(a,b),searchsorted2d_loopy(a,b))
Out[175]: True
In [176]: %timeit searchsorted2d_loopy(a,b)
10 loops, best of 3: 28.6 ms per loop
In [177]: %timeit searchsorted2d(a,b)
100 loops, best of 3: 13.7 ms per loop
```