Ludvig Ludvig - 3 years ago 131
Python Question

How to merge multiindex column dataframe

I want to merge static data with time varying data.

First dataframe

a_columns = pd.MultiIndex.from_product([["A","B","C"],["1","2"]])
a_index = pd.date_range("20100101","20110101",freq="BM")
a = pd.DataFrame(columns=a_columns,index=a_index)#A


Second dataframe

b_columns = ["3","4","5"]
b_index = ["A","B","C"]
b = pd.DataFrame(columns=b_columns,index=b_index)


How do i join these two? My desired dataframe has the form as A but with additional columns.

Thanks!

Answer Source

I think you need reshape by stack and then create df by to_frame - for concat need Datetimeindex, so new index was from first value of index of a.

Last concat + sort_index:

#added some data - 2
a_columns = pd.MultiIndex.from_product([["A","B","C"],["1","2"]])
a_index = pd.date_range("20100101","20110101",freq="BM")
a = pd.DataFrame(2,columns=a_columns,index=a_index)#A

#added some data - 1
b_columns = ["3","4","5"]
b_index = ["A","B","C"]
b = pd.DataFrame(1,columns=b_columns,index=b_index)

c = b.stack().to_frame(a.index[0]).T
print (c)
            A        B        C      
            3  4  5  3  4  5  3  4  5
2010-01-29  1  1  1  1  1  1  1  1  1

d = pd.concat([a,c], axis=1).sort_index(axis=1)
print (d)
            A                    B                    C                  
            1  2    3    4    5  1  2    3    4    5  1  2    3    4    5
2010-01-29  2  2  1.0  1.0  1.0  2  2  1.0  1.0  1.0  2  2  1.0  1.0  1.0
2010-02-26  2  2  NaN  NaN  NaN  2  2  NaN  NaN  NaN  2  2  NaN  NaN  NaN
2010-03-31  2  2  NaN  NaN  NaN  2  2  NaN  NaN  NaN  2  2  NaN  NaN  NaN
2010-04-30  2  2  NaN  NaN  NaN  2  2  NaN  NaN  NaN  2  2  NaN  NaN  NaN
2010-05-31  2  2  NaN  NaN  NaN  2  2  NaN  NaN  NaN  2  2  NaN  NaN  NaN
2010-06-30  2  2  NaN  NaN  NaN  2  2  NaN  NaN  NaN  2  2  NaN  NaN  NaN
2010-07-30  2  2  NaN  NaN  NaN  2  2  NaN  NaN  NaN  2  2  NaN  NaN  NaN
2010-08-31  2  2  NaN  NaN  NaN  2  2  NaN  NaN  NaN  2  2  NaN  NaN  NaN
2010-09-30  2  2  NaN  NaN  NaN  2  2  NaN  NaN  NaN  2  2  NaN  NaN  NaN
2010-10-29  2  2  NaN  NaN  NaN  2  2  NaN  NaN  NaN  2  2  NaN  NaN  NaN
2010-11-30  2  2  NaN  NaN  NaN  2  2  NaN  NaN  NaN  2  2  NaN  NaN  NaN
2010-12-31  2  2  NaN  NaN  NaN  2  2  NaN  NaN  NaN  2  2  NaN  NaN  NaN

Last if need replace NaNs only in added columns by first row:

d[c.columns] = d[c.columns].ffill()
print (d)
            A                    B                    C                  
            1  2    3    4    5  1  2    3    4    5  1  2    3    4    5
2010-01-29  2  2  1.0  1.0  1.0  2  2  1.0  1.0  1.0  2  2  1.0  1.0  1.0
2010-02-26  2  2  1.0  1.0  1.0  2  2  1.0  1.0  1.0  2  2  1.0  1.0  1.0
2010-03-31  2  2  1.0  1.0  1.0  2  2  1.0  1.0  1.0  2  2  1.0  1.0  1.0
2010-04-30  2  2  1.0  1.0  1.0  2  2  1.0  1.0  1.0  2  2  1.0  1.0  1.0
2010-05-31  2  2  1.0  1.0  1.0  2  2  1.0  1.0  1.0  2  2  1.0  1.0  1.0
2010-06-30  2  2  1.0  1.0  1.0  2  2  1.0  1.0  1.0  2  2  1.0  1.0  1.0
2010-07-30  2  2  1.0  1.0  1.0  2  2  1.0  1.0  1.0  2  2  1.0  1.0  1.0
2010-08-31  2  2  1.0  1.0  1.0  2  2  1.0  1.0  1.0  2  2  1.0  1.0  1.0
2010-09-30  2  2  1.0  1.0  1.0  2  2  1.0  1.0  1.0  2  2  1.0  1.0  1.0
2010-10-29  2  2  1.0  1.0  1.0  2  2  1.0  1.0  1.0  2  2  1.0  1.0  1.0
2010-11-30  2  2  1.0  1.0  1.0  2  2  1.0  1.0  1.0  2  2  1.0  1.0  1.0
2010-12-31  2  2  1.0  1.0  1.0  2  2  1.0  1.0  1.0  2  2  1.0  1.0  1.0

Similar solution with reindex:

c = b.stack().to_frame(a.index[0]).T.reindex(a.index, method='ffill')
print (c)
            A        B        C      
            3  4  5  3  4  5  3  4  5
2010-01-29  1  1  1  1  1  1  1  1  1
2010-02-26  1  1  1  1  1  1  1  1  1
2010-03-31  1  1  1  1  1  1  1  1  1
2010-04-30  1  1  1  1  1  1  1  1  1
2010-05-31  1  1  1  1  1  1  1  1  1
2010-06-30  1  1  1  1  1  1  1  1  1
2010-07-30  1  1  1  1  1  1  1  1  1
2010-08-31  1  1  1  1  1  1  1  1  1
2010-09-30  1  1  1  1  1  1  1  1  1
2010-10-29  1  1  1  1  1  1  1  1  1
2010-11-30  1  1  1  1  1  1  1  1  1
2010-12-31  1  1  1  1  1  1  1  1  1

d = pd.concat([a,c], axis=1).sort_index(axis=1)
print (d)
            A              B              C            
            1  2  3  4  5  1  2  3  4  5  1  2  3  4  5
2010-01-29  2  2  1  1  1  2  2  1  1  1  2  2  1  1  1
2010-02-26  2  2  1  1  1  2  2  1  1  1  2  2  1  1  1
2010-03-31  2  2  1  1  1  2  2  1  1  1  2  2  1  1  1
2010-04-30  2  2  1  1  1  2  2  1  1  1  2  2  1  1  1
2010-05-31  2  2  1  1  1  2  2  1  1  1  2  2  1  1  1
2010-06-30  2  2  1  1  1  2  2  1  1  1  2  2  1  1  1
2010-07-30  2  2  1  1  1  2  2  1  1  1  2  2  1  1  1
2010-08-31  2  2  1  1  1  2  2  1  1  1  2  2  1  1  1
2010-09-30  2  2  1  1  1  2  2  1  1  1  2  2  1  1  1
2010-10-29  2  2  1  1  1  2  2  1  1  1  2  2  1  1  1
2010-11-30  2  2  1  1  1  2  2  1  1  1  2  2  1  1  1
2010-12-31  2  2  1  1  1  2  2  1  1  1  2  2  1  1  1
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download