PanchoVarallo PanchoVarallo - 7 months ago 17
Python Question

Expressions with "== True" and "is True" give different results

I have the following MCVE:

#!/usr/bin/env python3

import pandas as pd

df = pd.DataFrame([True, False, True])

print("Whole DataFrame:")
print(df)

print("\nFiltered DataFrame:")
print(df[df[0] == True])


The output is the following, which I expected:

Whole DataFrame:
0
0 True
1 False
2 True

Filtered DataFrame:
0
0 True
2 True


Okay, but the PEP8 style seems to be wrong, it says: E712 comparison to True should be
if cond is True
or
if cond
. So I changed it to
is True
instead of
== True
but now it fails, the output is:

Whole DataFrame:
0
0 True
1 False
2 True

Filtered DataFrame:
0 True
1 False
2 True
Name: 0, dtype: bool


What is going on?

Answer

The catch here is that in df[df[0] == True], you are not comparing objects to True.

As the other answers say, == is overloaded in pandas to produce a Series instead of a bool as it normally does. [] is overloaded, too, to interpret the Series and give the filtered result. The code is essentially equivalent to:

series = df[0].__eq__(True)
df.__getitem__(series)

So, you're not violating PEP8 by leaving == here.


Essentially, pandas gives familiar syntax unusual semantics - that is what caused the confusion.

Accoring to Stroustroup (sec.3.3.3), operator overloading has been causing trouble due to this ever since its invention (and he had to think hard whether to include it into C++). Seeing even more abuse of it in C++, Gosling ran to the other extreme in Java, banning it completely, and that proved to be exactly that, an extreme.

As a conclusion, modern languages and code tend to have operator overloading but watch closely not to overuse it and for semantics to stay consistent.