user1804633 - 4 months ago 160

Python Question

I'm using Pandas to compare the outputs of two files loaded into two data frames (uat, prod):

...

`uat = uat[['Customer Number','Product']]`

prod = prod[['Customer Number','Product']]

print uat['Customer Number'] == prod['Customer Number']

print uat['Product'] == prod['Product']

print uat == prod

The first two match exactly:

74357 True

74356 True

Name: Customer Number, dtype: bool

74357 True

74356 True

Name: Product, dtype: bool

For the third print, I get an error:

Can only compare identically-labeled DataFrame objects. If the first two compared fine, what's wrong with the 3rd?

Thanks

Answer

Here's a small example to demonstrate this (which only applied to DataFrames, not Series, until Pandas 0.19 where it applies to both):

```
In [1]: df1 = pd.DataFrame([[1, 2], [3, 4]])
In [2]: df2 = pd.DataFrame([[3, 4], [1, 2]], index=[1, 0])
In [3]: df1 == df2
Exception: Can only compare identically-labeled DataFrame objects
```

One solution is to sort the index first (Note: some functions require sorted indexes):

```
In [4]: df2.sort_index(inplace=True)
In [5]: df1 == df2
Out[5]:
0 1
0 True True
1 True True
```

You could also do this using sort:

```
In [11]: df1.sort(axis=0) == df2.sort(axis=0)
Out[11]:
0 1
0 True True
1 True True
```

Note: `==`

is also sensitive to the order of columns:

```
In [12]: df1.sort(axis=0).sort(axis=1) == df2.sort(axis=0).sort(axis=1)
Out[12]:
0 1
0 True True
1 True True
```