ek-ok ek-ok - 1 year ago 125
Python Question

pandas merge on columns with different names and avoid duplicates

How can I merge two pandas DataFrames on two columns with different names and keep one of the columns?

df1 = pd.DataFrame({'UserName': [1,2,3], 'Col1':['a','b','c']})
df2 = pd.DataFrame({'UserID': [1,2,3], 'Col2':['d','e','f']})
pd.merge(df1, df2, left_on='UserName', right_on='UserID')

This provides a DataFrame like this

enter image description here

But clearly I am merging on
so they are the same. I want it to look like this. Is there any clean ways to do this?

enter image description here

Only the ways I can think of are either re-naming the columns to be the same before merge, or droping one of them after merge. I would be nice if pandas automatically drops one of them or I could do something like

pd.merge(df1, df2, left_on='UserName', right_on='UserID', keep_column='left')

Answer Source

How about set the UserID as index and then join on index for the second data frame?

pd.merge(df1, df2.set_index('UserID'), left_on='UserName', right_index=True)

#   Col1    UserName    Col2
# 0    a           1       d
# 1    b           2       e
# 2    c           3       f