DilithiumMatrix - 5 months ago 15
Python Question

# Find elements of array one nearest to elements of array two

This answer explains how to find the nearest (sorted) array element to a single point, in a manner efficient for large arrays (slightly modified):

``````def arg_nearest(array, value):
idx = np.searchsorted(array, value, side="left")
if idx > 0 and (idx == len(array) or math.fabs(value - array[idx-1]) < math.fabs(value - array[idx])):
return idx-1
else:
return idx
``````

If, instead, we want to find the array elements nearest a set of points (i.e. a second array); are there efficient (by speed, for large arrays) ways of extending this besides using a for-loop?

Some test cases:

``````>>> xx = [0.2, 0.8, 1.3, 1.5, 2.0, 3.1, 3.8, 3.9, 4.5, 5.1, 5.5]
>>> yy = [1, 2, 3, 4, 5]
>>> of_x_nearest_y(xx, yy)
[0.5, 2.0, 3.1, 3.9, 5.1]

>>> xx = [0.2, 0.8, 1.3, 1.5, 2.0, 3.1, 3.8, 3.9, 4.5, 5.1, 5.5]
>>> yy = [-2, -1, 4.6, 5.8]
>>> of_x_nearest_y(xx, yy)
[0.2, 0.2, 4.5, 5.5]
``````

Edit: assuming both arrays are sorted, you can do a little better than a completely naive for-loop by excluding values below those already matched, i.e.

``````def args_nearest(options, targets):
locs = np.zeros(targets.size, dtype=int)
prev = 0
for ii, tt in enumerate(targets):
locs[ii] = prev + arg_nearest(options[prev:], tt)
prev = locs[ii]
return locs
``````

You can make few changes to extend it for an array of elements in `value`, like so -

``````idx = np.searchsorted(xx, yy, side="left").clip(max=xx.size-1)
mask = (idx > 0) &  \
( (idx == len(xx)) | (np.fabs(yy - xx[idx-1]) < np.fabs(yy - xx[idx])) )
``````

Explanation

Nomenclature : `array` is the array in which we are looking to place elements from `value` to maintain the sorted nature of `array`.

Changes needed to extend the solution for a single element to many elements for searching :

1] Clip the indices array `idx` obtained from `np.searchsorted` at a max. of `array.size-1`, because for elements in `value` that are larger than the maximum of `array`, we need to make `idx` indexable by `array`.

2] Introduce `numpy` to replace `math` to do those operations in a vectorized manner.

3] Replace the conditional statement by the trick of `idx - mask`. In this case, internally Python would up-convert `mask` to an `int` array to match up with the datatype of `idx`. Thus, all the `True` elements become `1` and thus for `True` elements we would effectively have `idx-1`, which is the `True` case of the IF conditional statement in the original code.