Bob Hopez Bob Hopez - 5 months ago 8
Python Question

Merging two or more columns which don't overlap

Follow up to this post:
Merging two columns which don't overlap and create new columns

import pandas as pd

df1 = pd.DataFrame([["2014", "q2", 2],
["2013", "q1", 1],],
columns=('Year', 'Quarter', 'Value'))

df2 = pd.DataFrame([["2016", "q1", 3],
["2015", "q1", 3]],
columns=('Year', 'Quarter', 'Value'))


print(df1.merge(df2, on='Year', how='outer'))


Results in:

Year Quarter_x Value_x Quarter_y Value_y
0 2014 q2 2 NaN NaN
1 2013 q1 1 NaN NaN
2 2016 NaN NaN q1 3
3 2015 NaN NaN q1 3


But I want to get this:

Year Quarter Value
0 2014 q2 2
1 2013 q1 1
2 2016 q1 3
3 2015 q1 3


Note: This doesn't produce the desired result... :(

print(df1.merge(df2, on=['Year', 'Quarter','Value'], how='outer').dropna())

Year Quarter Value
0 2014 q2 2
1 2013 q1 1


... using 'left' or right' or inner also don't cut it.

Answer

Not sure what's happening here, but if I do

df1.merge(df2, on=['Year', 'Quarter', 'Value'], how='outer').dropna()

I get:

   Year Quarter  Value
0  2014      q2    2.0
1  2013      q1    1.0
2  2016      q1    3.0
3  2015      q1    3.0

You may want to take a look at the merge, join & concat docs.

The most 'intuitive' way for this is probably .append():

df1.append(df2)

   Year Quarter  Value
0  2014      q2    2.0
1  2013      q1    1.0
2  2016      q1    3.0
3  2015      q1    3.0

If you look into the source code, you'll find it calls concat behind the scenes.

Merge is useful and intended for cases where you have columns with overlapping values.

Comments