thomleo - 29 days ago 5
Python Question

# Working with floating point numbers for comparison and related operations in NumPy

I have an array of random floats and I need to compare it to another one that has the same values in a different order. For that matter I use the sum, product (and other combinations depending on the dimension of the table hence the number of equations needed).

Nevertheless, I encountered a precision issue when I perform the sum (or product) on the array depending on the order of the values.

Here is a simple standalone example to illustrate this issue :

``````import numpy as np

n = 10
m = 4

tag = np.random.rand(n, m)

s1 = np.sum(tag, axis=1)
s2 = np.sum(tag[:, ::-1], axis=1)

# print the number of times s1 is not equal to s2 (should be 0)
print np.nonzero(s1 != s2)[0].shape[0]
``````

If you execute this code it sometimes tells you that
`s1`
and
`s2`
are not equal and the differents is of magnitude of the computer precision.

The problem is I need to use those in functions like
`np.in1d`
where I can't really give a tolerance...

Is there a way to avoid this issue?

For the listed code, you can use `np.isclose` and with it tolerance values could be specified too.

Using the provided sample, let's see how it could be used -

``````In [201]: n = 10
...: m = 4
...:
...: tag = np.random.rand(n, m)
...:
...: s1 = np.sum(tag, axis=1)
...: s2 = np.sum(tag[:, ::-1], axis=1)
...:

In [202]: np.nonzero(s1 != s2)[0].shape[0]
Out[202]: 4

In [203]: (~np.isclose(s1,s2)).sum() # So, all matches!
Out[203]: 0
``````

To make use of tolerance values in other scenarios, we need to work on a case-by-case basis. So, let's say for an implementation that involve elementwise comparison like in `np.in1d`, we can bring in `broadcasting` to do those elementwise equality checks for all elems in first input against all elems in the second one. Then, we use `np.abs` to get the "closeness factor" and finally compare against the input tolerance to decide the matches. As needed to simulate `np.in1d`, we do ANY operation along one of the axis. Thus, `np.in1d` with tolerance using `broadcasting` could be implemented like so -

``````def in1d_with_tolerance(A,B,tol=1e-05):
return (np.abs(A[:,None] - B) < tol).any(1)
``````

As suggested in the comments by OP, we can also round floating-pt numbers after scaling them up and this should be memory efficient, as being needed for working with large arrays. So, a modified version would be like so -

``````def in1d_with_tolerance_v2(A,B,tol=1e-05):
S = round(1/tol)
return np.in1d(np.around(A*S).astype(int),np.around(B*S).astype(int))
``````

Sample run -

``````In [372]: A = np.random.rand(5)
...: B = np.random.rand(7)
...: B[3] = A[1] + 0.0000008
...: B[6] = A[4] - 0.0000007
...:

In [373]: np.in1d(A,B) # Not the result we want!
Out[373]: array([False, False, False, False, False], dtype=bool)

In [374]: in1d_with_tolerance(A,B)
Out[374]: array([False,  True, False, False,  True], dtype=bool)

In [375]: in1d_with_tolerance_v2(A,B)
Out[375]: array([False,  True, False, False,  True], dtype=bool)
``````

Finally, on how to make it work for other implementations and use cases - It would depend on the implementation itself. But for most cases, `np.isclose` and `broadcasting` should help.