I have an array of random floats and I need to compare it to another one that has the same values in a different order. For that matter I use the sum, product (and other combinations depending on the dimension of the table hence the number of equations needed).
Nevertheless, I encountered a precision issue when I perform the sum (or product) on the array depending on the order of the values.
Here is a simple standalone example to illustrate this issue :
import numpy as np
n = 10
m = 4
tag = np.random.rand(n, m)
s1 = np.sum(tag, axis=1)
s2 = np.sum(tag[:, ::-1], axis=1)
# print the number of times s1 is not equal to s2 (should be 0)
print np.nonzero(s1 != s2).shape
For the listed code, you can use
np.isclose and with it tolerance values could be specified too.
Using the provided sample, let's see how it could be used -
In : n = 10 ...: m = 4 ...: ...: tag = np.random.rand(n, m) ...: ...: s1 = np.sum(tag, axis=1) ...: s2 = np.sum(tag[:, ::-1], axis=1) ...: In : np.nonzero(s1 != s2).shape Out: 4 In : (~np.isclose(s1,s2)).sum() # So, all matches! Out: 0
To make use of tolerance values in other scenarios, we need to work on a case-by-case basis. So, let's say for an implementation that involve elementwise comparison like in
np.in1d, we can bring in
broadcasting to do those elementwise equality checks for all elems in first input against all elems in the second one. Then, we use
np.abs to get the "closeness factor" and finally compare against the input tolerance to decide the matches. As needed to simulate
np.in1d, we do ANY operation along one of the axis. Thus,
np.in1d with tolerance using
broadcasting could be implemented like so -
def in1d_with_tolerance(A,B,tol=1e-05): return (np.abs(A[:,None] - B) < tol).any(1)
As suggested in the comments by OP, we can also round floating-pt numbers after scaling them up and this should be memory efficient, as being needed for working with large arrays. So, a modified version would be like so -
def in1d_with_tolerance_v2(A,B,tol=1e-05): S = round(1/tol) return np.in1d(np.around(A*S).astype(int),np.around(B*S).astype(int))
Sample run -
In : A = np.random.rand(5) ...: B = np.random.rand(7) ...: B = A + 0.0000008 ...: B = A - 0.0000007 ...: In : np.in1d(A,B) # Not the result we want! Out: array([False, False, False, False, False], dtype=bool) In : in1d_with_tolerance(A,B) Out: array([False, True, False, False, True], dtype=bool) In : in1d_with_tolerance_v2(A,B) Out: array([False, True, False, False, True], dtype=bool)
Finally, on how to make it work for other implementations and use cases - It would depend on the implementation itself. But for most cases,
broadcasting should help.