sebap123 sebap123 - 5 months ago 19
Python Question

Merging multiple dataframes on column

I am trying to merge/join multiple

s and so far I have no luck. I've found
method, but it works only with two Dataframes. I also found this SO answer suggesting to do something like that:


Unfortunatelly it will not work in my case, because I have 20+ number of dataframes.

My next idea was to use
. According to the reference when joining multiple dataframes I need to use list and only I can join on index column. So I changed indexes for all of the columns (ok, it can be done grammatically easily) and end up with something like this:


Unfortunately, also this approach failed, because other columns names are this same in all dataframes. I've decided to do the last thing, that is renaming all columns. But when I finally joined everything:
df = pd.Dataframe()

I've received empty dataframe. I have no more idea, how I can join them. Can someone suggest anything more?


Sample input:

import pandas as pd

df1 = pd.DataFrame(np.array([
['a', 5, 19],
['b', 14, 16],
['c', 4, 9]]),
columns=['name', 'attr1', 'attr2'])
df2 = pd.DataFrame(np.array([
['a', 15, 49],
['b', 4, 36],
['c', 14, 9]]),
columns=['name', 'attr1', 'attr2'])

name attr1 attr2
0 a 5 19
1 b 14 16
2 c 4 9

name attr1 attr2
0 a 15 49
1 b 4 36
2 c 14 9

Expected output:

name attr1_1 attr2_1 attr1_2 attr2_2
0 a 5 19 15 49
1 b 14 16 4 36
2 c 4 9 14 9

Indexes might be unordered between dataframes, but it is guaranteed, that they will exists.


use pd.concat

dflist = [df1, df2]
keys = ["%d" % i for i in range(1, len(dflist) + 1)]

merged = pd.concat([df.set_index('name') for df in dflist], axis=1, keys=keys)
merged.columns = merged.swaplevel(0, 1, 1).columns.to_series().str.join('_')


enter image description here



enter image description here