Stacey Stacey - 2 months ago 5x
Python Question

two dataframes into one

Not sure if this is possible. I have two dataframes df1 and df2 which are presented like this:

df1 df2

id value id value
a 5 a 3
c 9 b 7
d 4 c 6
f 2 d 8
e 2
f 1

They will have many more entries in reality than presented here. I would like to create a third dataframe df3 based on the values in df1 and df2. Any values in df1 would take precedence over values in df2 when writing to df3 (if the same id is present in both df1 and df2) so in this example I would return:

id value
a 5
b 7
c 9
d 4
e 2
f 2

I have tried using df2 as the base (df2 will have all of the id's present for the whole universe) and then overwriting the value for id's that are present in df1, but cannot find the merge syntax to do this. Any help would be greatly appreciated.



You could use combine_first, provided that you first make the DataFrame index id (so that the values get aligned by id):

In [80]: df1.set_index('id').combine_first(df2.set_index('id')).reset_index()
  id  value
0  a    5.0
1  b    7.0
2  c    9.0
3  d    4.0
4  e    2.0
5  f    2.0

Since you mentioned merging, you might be interested in seeing that you could merge df1 and df2 on id, and then use fillna to replace NaNs in df1's the value column with values from df2's value column:

df1 = pd.DataFrame({'id': ['a', 'c', 'd', 'f'], 'value': [5, 9, 4, 2]})
df2 = pd.DataFrame({'id': ['a', 'b', 'c', 'd', 'e', 'f'], 'value': [3, 7, 6, 8, 2, 1]})
result = pd.merge(df2, df1, on='id', how='left', suffixes=('_x', ''))
result['value'] = result['value'].fillna(result['value_x'])
result = result[['id', 'value']]

yields the same result, though the first method is simpler.