Physicist - 1 year ago 56

Python Question

I have an array

`A`

`B`

`A`

`B`

`A = [2,100,300,793,1300,1500,1810,2400]`

B = [4,305,789,1234,1890]

`B`

`A`

`A`

`A`

`B`

`A`

`B`

`A`

`A'=[2,300,793,1300,1810]`

`100,1500,2400`

Answer Source

**Approach #1:** With `NumPy broadcasting`

, we can look for absolute element-wise subtractions between the input arrays and use an appropriate threshold to filter out unwanted elements from `A`

. It seems for the given sample inputs, a threshold of `90`

works.

Thus, we would have an implementation, like so -

```
thresh = 90
Aout = A[(np.abs(A[:,None] - B) < thresh).any(1)]
```

Sample run -

```
In [69]: A
Out[69]: array([ 2, 100, 300, 793, 1300, 1500, 1810, 2400])
In [70]: B
Out[70]: array([ 4, 305, 789, 1234, 1890])
In [71]: A[(np.abs(A[:,None] - B) < 90).any(1)]
Out[71]: array([ 2, 300, 793, 1300, 1810])
```

**Approach #2:** Based on `this post`

, here's a memory efficient approach using `np.searchsorted`

, which could be crucial for large arrays -

```
def searchsorted_filter(a, choices, thresh):
lidx = np.searchsorted(choices, a, 'left').clip(max=choices.size-1)
ridx = (np.searchsorted(choices, a, 'right')-1).clip(min=0)
cl = np.take(choices,lidx) # Or choices[lidx]
cr = np.take(choices,ridx) # Or choices[ridx]
return A[np.minimum(np.abs(A - cl), np.abs(A - cr)) < thresh]
```

Sample run -

```
In [95]: searchsorted_filter(A,B, thresh = 90)
Out[95]: array([ 2, 300, 793, 1300, 1810])
```

**Runtime test**

```
In [104]: A = np.sort(np.random.randint(0,100000,(1000)))
In [105]: B = np.sort(np.random.randint(0,100000,(400)))
In [106]: out1 = A[(np.abs(A[:,None] - B) < 10).any(1)]
In [107]: out2 = searchsorted_filter(A,B, thresh = 10)
In [108]: np.allclose(out1, out2) # Verify results
Out[108]: True
In [109]: %timeit A[(np.abs(A[:,None] - B) < 10).any(1)]
100 loops, best of 3: 2.74 ms per loop
In [110]: %timeit searchsorted_filter(A,B, thresh = 10)
10000 loops, best of 3: 85.3 µs per loop
```