rossbot rossbot - 2 months ago 19
Python Question

pandas function to fill missing values from other dataframe based on matching column?

So I have two dataframes: one where certain columns are filled in and one where others are filled in but some from the previous df are missing. Both share some common non-empty columns.



DF1:
FirstName Uid JoinDate BirthDate
Bob 1 20160628 NaN
Charlie 3 20160627 NaN

DF2:
FirstName Uid JoinDate BirthDate
Bob 1 NaN 19910524
Alice 2 NaN 19950403

Result:
FirstName Uid JoinDate BirthDate
Bob 1 20160628 19910524
Alice 2 NaN 19950403
Charlie 3 20160627 NaN


Assuming that these rows do not share index positions in their respective dataframes, is there a way that I can fill the missing values in DF1 with values from DF2 where the rows match on a certain column (in this example Uid)?

Also, is there a way to create a new entry in DF1 from DF2 if there isn't a match on that column (e.g. Uid) without removing rows in DF1 that don't match any rows in DF2?

EDIT: I updated the dataframes to add non-matching results in both dataframes that I need in the result df. I also updated my last question to reflect that.

Answer

UPDATE: you can do it setting the proper indices and finally resetting the index of joined DF:

In [14]: df1.set_index('FirstName').combine_first(df2.set_index('FirstName')).reset_index()
Out[14]:
  FirstName  Uid    JoinDate   BirthDate
0     Alice  2.0         NaN  19950403.0
1       Bob  1.0  20160628.0  19910524.0
2   Charlie  3.0  20160627.0         NaN

try this:

In [113]: df2.combine_first(df1)
Out[113]:
  FirstName  Uid    JoinDate  BirthDate
0       Bob    1  20160628.0   19910524
1     Alice    2         NaN   19950403
Comments