robintw robintw - 5 months ago 48
Python Question

Find unique elements of floating point array in numpy (with comparison using a delta value)

I've got a

ndarray
of floating point values in numpy and I want to find the unique values of this array. Of course, this has problems because of floating point accuracy...so I want to be able to set a delta value to use for the comparisons when working out which elements are unique.

Is there a way to do this? At the moment I am simply doing:

unique(array)


Which gives me something like:

array([ -Inf, 0.62962963, 0.62962963, 0.62962963, 0.62962963,
0.62962963])


where the values that look the same (to the number of decimal places being displayed) are obviously slightly different.

Answer

Doesn't floor and round both fail the OP's requirement in some cases?

np.floor([5.99999999, 6.0]) # array([ 5.,  6.])
np.round([6.50000001, 6.5], 0) #array([ 7.,  6.])

The way I would do it is (and this may not be optimal (and is surely slower than other answers)) something like this:

import numpy as np
TOL = 1.0e-3
a = np.random.random((10,10))
i = np.argsort(a.flat)
d = np.diff(a.flat[i])
result = a.flat[i[d>TOL]]

Of course this method will exclude all but the largest member of a run of values that come within the tolerance of any other value, which means you may not find any unique values in an array if all values are significantly close even though the max-min is larger than the tolerance.

Here is essentially the same algorithm, but easier to understand and should be faster as it avoids an indexing step:

a = np.random.random((10,))
b = a.copy()
b.sort()
d = np.diff(b)
result = b[d>TOL]

The OP may also want to look into scipy.cluster (for a fancy version of this method) or numpy.digitize (for a fancy version of the other two methods)

Comments