David David - 1 year ago 142
Python Question

Pandas selecting row by column value, strange behaviour

Ok, I have a pandas dataframe like this:

lat long level date time value
3341 29.232 -15.652 10.0 20100109.0 700.0 0.5
3342 27.887 -13.668 120.0 20100109.0 700.0 3.2
3899 26.345 -11.234 0.0 20100109.0 700.0 5.8

The reason of the strange number of the index is because it comes from a csv converted to pandas dataframe with some values filtered. Columns
are not really relevant.

I am trying, in
, to see the some rows filtering by latitude, so I do (if the dataframe is

c[c['lat'] == 26.345]


c.loc[c['lat'] == 26.345]

and I can see if the value is present or not, but sometimes it outputs nothing for latitude values that I am seeing in the dataframe !?! (For instance, I can see in the dataframe the value of latitude 27.702 and when I do
c[c['lat'] == 27.702]
c.loc[c['lat'] == 27.702]
I get an empty dataframe and I am seeing the value for such latitude. What is happening here?

Thank you.

Answer Source

This is probably because you are asking for an exact match against floating point values, which is very, very dangerous. They are approximations, often printed to less precision than actually stored.

It's very easy to see 0.735471 printed, say, and think that's all there is, when in fact the value is really 0.73547122072282867; the display function has simply truncated the result. But when you try a strict equality test on the attractively short value, boom. Doesn't work.

Instead of

c[c['lat'] == 26.345]


import numpy as np

c[np.isclose(c['lat'], 26.345)]

Now you'll get values that are within a certain range of the value you specified. You can set the tolerance.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download