user2320577 - 1 year ago 103
Python Question

# Compare two group results

I have the following 2 dataframe:

``````DF1:
DATE           ID_1 ID_2 RESULT
0  2014-06-16     1    a    RED
1  2014-07-01     1    a    WHITE
2  2014-08-16     2    c    BLUE
3  2015-08-16     3    a    RED

DF2
DATE           ID_1 ID_2 RESULT
0  2014-06-16     1    z    WHITE
1  2014-07-01     1    z    WHITE
2  2014-08-16     2    h    BLUE
3  2014-08-16     3    k    RED
``````

That you can obtain by running this:

``````df1 = pd.DataFrame(columns=["DATE","ID_1", "ID_2", "RESULT" ])
df2 = pd.DataFrame(columns=["DATE","ID_1", "ID_2","RESULT"])

df1["DATE"] = ['2014-06-16', '2014-07-01', '2014-08-16', '2015-08-16']
df1['ID_1'] = [1,1,2,3]
df1['ID_2'] = ['a', 'a', 'c', 'a']
df1['RESULT'] = ['RED', 'WHITE', 'BLUE', 'RED']

df2["DATE"] = ['2014-06-16', '2014-07-01', '2014-08-16' ,  '2014-08-16']
df2['ID_1'] = [1,1,2,3]
df2['ID_2'] = ['z', 'z', 'h', 'k']
df2['RESULT'] = ['WHITE', 'WHITE', 'BLUE', 'RED']
``````

Now I need to groupby "ID_1" on both and compare if all columns (except ID_2) are equals. Ideally by showing the differences

The result should be like:

`````` DATE           ID_1 ID_2x ID2y  RESULTx RESULTy
2014-06-16     1    z     a     WHITE   RED
``````

I tried by grouping as follow:

`````` grp1 = df1.groupby("ID_1")
grp2 = df2.groupby("ID_1")

for (g1,g2) in zip(grp1,grp2):
g1[1][["DATE", "RESULT"]] != g2[1][["DATE", "RESULT"]]
``````

But I think is not efficient. Moreover I get a comparison error:

``````ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()
``````

Any idea about how to proceed ?

Thanks!

``````cols = ['DATE', 'ID_1', 'RESULT']