webmaker webmaker - 2 months ago 7
Python Question

using pandas.dataframe.isin with a tolerance parameter

I reviewed the following posts beforehand. Is there a way to use DataFrame.isin() with an approximation factor or a tolerance value? Or is there another method that could?

How to filter the DataFrame rows of pandas by "within"/"in"?

use a list of values to select rows from a pandas dataframe

EX)

df = DataFrame({'A' : [5,6,3.3,4], 'B' : [1,2,3.2, 5]})

In : df
Out:
A B
0 5 1
1 6 2
2 3.3 3.2
3 4 5

df[df['A'].isin([3, 6], tol=.5)]

In : df
Out:
A B
1 6 2
2 3.3 3.2

Answer

You can do a similar thing with numpy's isclose:

df[np.isclose(df['A'].values[:, None], [3, 6], atol=.5).any(axis=1)]
Out: 
     A    B
1  6.0  2.0
2  3.3  3.2

np.isclose returns this:

np.isclose(df['A'].values[:, None], [3, 6], atol=.5)
Out: 
array([[False, False],
       [False,  True],
       [ True, False],
       [False, False]], dtype=bool)

It is a pairwise comparison of df['A']'s elements and [3, 6] (that's why we needed df['A'].values[: None] - for broadcasting). Since you are looking for whether it is close to any one of them in the list, we call .any(axis=1) at the end.