jxramos jxramos - 3 years ago 62
Python Question

What are the rules for pandas' elementwise binary boolean operands with same length elements holding differing indexes?

I have been applying some binary boolean operators about my code base and came across a bug that really surprised me. I've reconstructed a minimal working example to demonstrate the behavior below...

import pandas
s = pandas.Series( [True]*4 )
d = pandas.DataFrame( { 'a':[True, False, True, False] , 'b':[True]*4 } )

print(d)
a b
0 True True
1 False True
2 True True
3 False True

print( s[0:2] )
0 True
1 True
dtype: bool

print( d.loc[ d['a'] , 'b' ] )
0 True
2 True
dtype: bool

print( s[0:2] & d.loc[ d['a'] , 'b' ] )
0 True
1 False
2 False


This last statement's value catches me entirely by surprise in its yielding of 3 elements. Realizing the influence of indices here I manually reset the index to yield the result I expected.

s[0:2].reset_index(drop=True) & d.loc[ d['a'] , 'b' ].reset_index( drop=True )
0 True
1 True


Needless to say I'll need to revisit the documentation and get a grip to understand how the indexing rules apply here. Can any one explain step by step how this operator behaves with mixed indexes?

=============================================

Just to add comparison for those coming from a similar R background, R's
data.frame
equivalent operation yields what I'd expect...

> a = c(TRUE,FALSE,TRUE,FALSE)
> b = c(TRUE,TRUE,TRUE,TRUE)
>
> d = data.frame( a, b )
> d
a b
1 TRUE TRUE
2 FALSE TRUE
3 TRUE TRUE
4 FALSE TRUE
> s = c( TRUE,TRUE,TRUE,TRUE)
> s
[1] TRUE TRUE TRUE TRUE
>
> d[ d$a , 'b']
[1] TRUE TRUE
>
> s[0:2]
[1] TRUE TRUE
> s[0:2] & d[ d$a , 'b']
[1] TRUE TRUE

Answer Source

You are comparing two series with different indices

s[0:2]

0    True
1    True
dtype: bool

and

d.loc[ d['a'] , 'b']

0    True
2    True
dtype: bool

pandas needs to align the indices then compares.

s[0:2] & d.loc[ d['a'] , 'b']

0     True  # True from both indices therefore True
1    False  # Only True from s[0:2] and missing from other therefore False
2    False  # Only True from d and missing from other therefore False
dtype: bool
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download