user308827 user308827 - 11 days ago 12
Python Question

Pandas join issue: columns overlap but no suffix specified

I have following 2 data frames:

df_a =

mukey DI PI
0 100000 35 14
1 1000005 44 14
2 1000006 44 14
3 1000007 43 13
4 1000008 43 13

df_b =
mukey niccdcd
0 190236 4
1 190237 6
2 190238 7
3 190239 4
4 190240 7


When I try to join these 2 dataframes:

join_df = LS_sgo.join(MSU_pi,on='mukey',how='left')


I get the error:

*** ValueError: columns overlap but no suffix specified: Index([u'mukey'], dtype='object')


Why is this so? The dataframes do have common 'mukey' values.

Answer

Your error on the snippet of data you posted is a little cryptic, in that because there are no common values, the join operation fails because the values don't overlap it requires you to supply a suffix for the left and right hand side:

In [173]:

df_a.join(df_b, on='mukey', how='left', lsuffix='_left', rsuffix='_right')
Out[173]:
       mukey_left  DI  PI  mukey_right  niccdcd
index                                          
0          100000  35  14          NaN      NaN
1         1000005  44  14          NaN      NaN
2         1000006  44  14          NaN      NaN
3         1000007  43  13          NaN      NaN
4         1000008  43  13          NaN      NaN

merge works because it doesn't have this restriction:

In [176]:

df_a.merge(df_b, on='mukey', how='left')
Out[176]:
     mukey  DI  PI  niccdcd
0   100000  35  14      NaN
1  1000005  44  14      NaN
2  1000006  44  14      NaN
3  1000007  43  13      NaN
4  1000008  43  13      NaN