user1804633 user1804633 - 1 month ago 34
Python Question

Pandas "Can only compare identically-labeled DataFrame objects" error

I'm using Pandas to compare the outputs of two files loaded into two data frames (uat, prod):
...

uat = uat[['Customer Number','Product']]
prod = prod[['Customer Number','Product']]
print uat['Customer Number'] == prod['Customer Number']
print uat['Product'] == prod['Product']
print uat == prod

The first two match exactly:
74357 True
74356 True
Name: Customer Number, dtype: bool
74357 True
74356 True
Name: Product, dtype: bool


For the third print, I get an error:
Can only compare identically-labeled DataFrame objects. If the first two compared fine, what's wrong with the 3rd?

Thanks

Answer

Here's a small example to demonstrate this (which only applied to DataFrames, not Series, until Pandas 0.19 where it applies to both):

In [1]: df1 = pd.DataFrame([[1, 2], [3, 4]])

In [2]: df2 = pd.DataFrame([[3, 4], [1, 2]], index=[1, 0])

In [3]: df1 == df2
Exception: Can only compare identically-labeled DataFrame objects

One solution is to sort the index first (Note: some functions require sorted indexes):

In [4]: df2.sort_index(inplace=True)

In [5]: df1 == df2
Out[5]: 
      0     1
0  True  True
1  True  True

You could also do this using sort:

In [11]: df1.sort(axis=0) == df2.sort(axis=0)
Out[11]: 
      0     1
0  True  True
1  True  True

Note: == is also sensitive to the order of columns:

In [12]: df1.sort(axis=0).sort(axis=1) == df2.sort(axis=0).sort(axis=1)
Out[12]: 
      0     1
0  True  True
1  True  True
Comments