kit kit - 2 months ago 9
Python Question

How to compare four columns of pandas dataframe at a time?

I have one dataframe.

Dataframe :

Symbol1 BB Symbol2 CC
0 ABC 1 ABC 1
1 PQR 1 PQR 1
2 CPC 2 CPC 0
3 CPC 2 CPC 1
4 CPC 2 CPC 2


I want to compare
Symbol1
with
Symbol2
and
BB
with
CC
, if they are same then I want that rows only other rows must be removed from the dataframe.

Expected Result :

Symbol1 BB Symbol2 CC
0 ABC 1 ABC 1
1 PQR 1 PQR 1
2 CPC 2 CPC 2


If comparison between two rows then I'm using :

df = df[df['BB'] == '2'].copy()


It will work fine.

df = df[df['BB'] == df['offset'] and df['Symbol1'] == df['Symbol2']].copy()


It is giving me error.

Error :

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().


How I can compare and get expected result?

Answer

You can use boolean indexing and compare with & instead and:

print ((df.Symbol1 == df.Symbol2) & (df.BB == df.CC))
0     True
1     True
2    False
3    False
4     True
dtype: bool

print (df[(df.Symbol1 == df.Symbol2) & (df.BB == df.CC)])
  Symbol1  BB Symbol2  CC
0     ABC   1     ABC   1
1     PQR   1     PQR   1
4     CPC   2     CPC   2