thomleo - 6 months ago 30

Python Question

I have an array of random floats and I need to compare it to another one that has the same values in a different order. For that matter I use the sum, product (and other combinations depending on the dimension of the table hence the number of equations needed).

Nevertheless, I encountered a precision issue when I perform the sum (or product) on the array depending on the order of the values.

Here is a simple standalone example to illustrate this issue :

`import numpy as np`

n = 10

m = 4

tag = np.random.rand(n, m)

s1 = np.sum(tag, axis=1)

s2 = np.sum(tag[:, ::-1], axis=1)

# print the number of times s1 is not equal to s2 (should be 0)

print np.nonzero(s1 != s2)[0].shape[0]

If you execute this code it sometimes tells you that

`s1`

`s2`

The problem is I need to use those in functions like

`np.in1d`

Is there a way to avoid this issue?

Answer

For the listed code, you can use `np.isclose`

and with it tolerance values could be specified too.

Using the provided sample, let's see how it could be used -

```
In [201]: n = 10
...: m = 4
...:
...: tag = np.random.rand(n, m)
...:
...: s1 = np.sum(tag, axis=1)
...: s2 = np.sum(tag[:, ::-1], axis=1)
...:
In [202]: np.nonzero(s1 != s2)[0].shape[0]
Out[202]: 4
In [203]: (~np.isclose(s1,s2)).sum() # So, all matches!
Out[203]: 0
```

To make use of tolerance values in other scenarios, we need to work on a case-by-case basis. So, let's say for an implementation that involve elementwise comparison like in `np.in1d`

, we can bring in `broadcasting`

to do those elementwise equality checks for all elems in first input against all elems in the second one. Then, we use `np.abs`

to get the "closeness factor" and finally compare against the input tolerance to decide the matches. As needed to simulate `np.in1d`

, we do ANY operation along one of the axis. Thus, `np.in1d`

with tolerance using `broadcasting`

could be implemented like so -

```
def in1d_with_tolerance(A,B,tol=1e-05):
return (np.abs(A[:,None] - B) < tol).any(1)
```

As suggested in the comments by OP, we can also round floating-pt numbers after scaling them up and this should be memory efficient, as being needed for working with large arrays. So, a modified version would be like so -

```
def in1d_with_tolerance_v2(A,B,tol=1e-05):
S = round(1/tol)
return np.in1d(np.around(A*S).astype(int),np.around(B*S).astype(int))
```

Sample run -

```
In [372]: A = np.random.rand(5)
...: B = np.random.rand(7)
...: B[3] = A[1] + 0.0000008
...: B[6] = A[4] - 0.0000007
...:
In [373]: np.in1d(A,B) # Not the result we want!
Out[373]: array([False, False, False, False, False], dtype=bool)
In [374]: in1d_with_tolerance(A,B)
Out[374]: array([False, True, False, False, True], dtype=bool)
In [375]: in1d_with_tolerance_v2(A,B)
Out[375]: array([False, True, False, False, True], dtype=bool)
```

Finally, on how to make it work for other implementations and use cases - It would depend on the implementation itself. But for most cases, `np.isclose`

and `broadcasting`

should help.