Dror Dror - 3 months ago 22
Python Question

Merge two DataFrames with the same schema

Assume you have two DataFrames

foo = pd.DataFrame([[123,321],[1543,432]], columns=['id','location'])
bar = pd.DataFrame([[123,421],[1543,436]], columns=['location','id'])


which you want to merge into one big table. However, as the schema (columns) are the same, in the resulting table a new column indicating the "type" should be added:

id location type
0 123 321 foo
1 1543 432 foo
0 421 123 bar
1 436 1543 bar


Currently, what I do is

foo['type'] = ['foo'] * foo.shape[0]
bar['type'] = ['bar'] * bar.shape[0]
pd.concat([foo,bar])


Is there some smarter way to do it and in particular, avoid the first two lines?

Answer Source

You can add parameter keys to concat - it create MultiIndex:

print (pd.concat([foo,bar], keys=('foo','bar')))
         id  location
foo 0   123       321
    1  1543       432
bar 0   421       123
    1   436      1543

And for MultiIndex to column use:

df = pd.concat([foo,bar], keys=('foo','bar'))
       .reset_index(drop=True, level=1)
       .rename_axis('type')
       .reset_index()
print (df)
  type    id  location
0  foo   123       321
1  foo  1543       432
2  bar   421       123
3  bar   436      1543