I have two dataframes that I'm trying to join in pandas (version 0.18.1).
test1 = pd.DataFrame({'id': range(1,6), 'place': ['Kent','Lenawee','Washtenaw','Berrien','Ottawa']})
id_1 place
0 1 Kent
1 2 Lenawee
2 3 Montreal
3 4 Berrien
4 5 Ottawa
test2 = pd.DataFrame({'id_2': range(6,11), 'id_parent': range(1,6)})
id_2 id_parent
0 6 1
1 7 2
2 8 3
3 9 4
4 10 5
df = test2.join(test1,on='id_parent',how='left')
id_2 id_parent id_1 place
0 6 1 2 Lenawee
1 7 2 3 Montreal
2 8 3 4 Berrien
3 9 4 5 Ottawa
4 10 5 NaN NaN
join
joins primarily on indices, use merge
for this:
In [18]:
test2.merge(test1,left_on='id_parent', right_on='id')
Out[18]:
id_2 id_parent id place
0 6 1 1 Kent
1 7 2 2 Lenawee
2 8 3 3 Washtenaw
3 9 4 4 Berrien
4 10 5 5 Ottawa
You get the NaN
because the rhs will use the rhs index and there is no entry for 0
and 5
so you get NaN