dbulgakov dbulgakov - 1 month ago 6
Python Question

How do I join two dataframes based on values in selected columns?

I am trying to join (merge) two dataframes based on values in each column.
For instance, to merge by values in columns in A and B.
So, having df1

A B C D
0 4 3 1 5
1 5 7 0 3
2 3 2 1 6


And df2

A B E F
0 4 3 4 5
1 5 7 3 3
2 3 8 5 5


I want to get a d3 with such structure

A B C D E F
0 4 3 1 5 4 5
1 5 7 0 3 3 3
2 3 2 1 6 Nan Nan
3 3 8 Nan Nan 5 5


Can you, please help me? I've tried both merge and join methods but havent succeed.

Answer

you can use pd.merge(..., how='outer') method:

In [193]: pd.merge(a,b, on=['A','B'], how='outer')
Out[193]:
   A  B    C    D    E    F
0  4  3  1.0  5.0  4.0  5.0
1  5  7  0.0  3.0  3.0  3.0
2  3  2  1.0  6.0  NaN  NaN
3  3  8  NaN  NaN  5.0  5.0

Data:

In [194]: a
Out[194]:
   A  B  C  D
0  4  3  1  5
1  5  7  0  3
2  3  2  1  6

In [195]: b
Out[195]:
   A  B  E  F
0  4  3  4  5
1  5  7  3  3
2  3  8  5  5