Tingiskhan - 1 year ago 152

Python Question

Assume that I have two arrays

`A`

`B`

`A`

`B`

`m x n`

`A`

`B`

`i`

`A`

`B`

`np.digitize`

`np.searchsorted`

`A`

`B`

My naive solution is to simply iterate over the rows. However, this is far too slow for my application. My question is therefore: is there a vectorized implementation of either algorithm that I haven't managed to find?

Recommended for you: Get network issues from **WhatsUp Gold**. **Not end users.**

Answer Source

We can add each row some offset as compared to the previous row. We would use the same offset for both arrays. The idea is to use `np.searchsorted`

on flattened version of input arrays thereafter and thus each row from `b`

would be restricted to find sorted positions in the corresponding row in `a`

.

So, we would have a vectorized implementation like so -

```
def searchsorted2d(a,b):
m,n = a.shape
max_num = np.maximum(a.max(), b.max()) + 1
r = max_num*np.arange(a.shape[0])[:,None]
p = np.searchsorted( (a+r).ravel(), (b+r).ravel() ).reshape(m,-1)
return p - n*(np.arange(m)[:,None])
```

Please note that this works on `non-negative numbers`

. To make it work for negative numbers too, we just need to offset for the minimum numbers as well.

Runtime test -

```
In [173]: def searchsorted2d_loopy(a,b):
...: out = np.zeros(a.shape,dtype=int)
...: for i in range(len(a)):
...: out[i] = np.searchsorted(a[i],b[i])
...: return out
...:
In [174]: # Setup input arrays
...: a = np.random.randint(11,99,(10000,20))
...: b = np.random.randint(11,99,(10000,20))
...: a = np.sort(a,1)
...: b = np.sort(b,1)
...:
In [175]: np.allclose(searchsorted2d(a,b),searchsorted2d_loopy(a,b))
Out[175]: True
In [176]: %timeit searchsorted2d_loopy(a,b)
10 loops, best of 3: 28.6 ms per loop
In [177]: %timeit searchsorted2d(a,b)
100 loops, best of 3: 13.7 ms per loop
```

Recommended from our users: **Dynamic Network Monitoring from WhatsUp Gold from IPSwitch**. ** Free Download**