BdB BdB - 3 months ago 14
Python Question

Preserve index and column order when combining DataFrames

Say we have the following DataFrames:

import pandas as pd
import numpy as np

df1_column_array = [['foo', 'bar'],
['A', 'B']]
df1_column_tuple = list(zip(*df1_column_array))
df1_column_header = pd.MultiIndex.from_tuples(df1_column_tuple)

df1_index_array = [['one','two'],
['0', '1']]
df1_index_tuple = list(zip(*df1_index_array))
df1_index_header = pd.MultiIndex.from_tuples(df1_index_tuple)


df1 = pd.DataFrame(np.random.rand(2,2), columns = df1_column_header, index = df1_index_header)
print(df1)
foo bar
A B
one 1 0.755296 0.101329
two 2 0.925653 0.587948

df2_column_array = [['alpha', 'beta'],
['C', 'D']]
df2_column_tuple = list(zip(*df2_column_array))
df2_column_header = pd.MultiIndex.from_tuples(df2_column_tuple)

df2_index_array = [['three', 'four'],
['3', '4']]
df2_index_tuple = list(zip(*df2_index_array))
df2_index_header = pd.MultiIndex.from_tuples(df2_index_tuple)


df2 = pd.DataFrame(np.random.rand(2,2), columns = df2_column_header, index = df2_index_header)
print(df2)
alpha beta
C D
three 3 0.751013 0.957824
four 4 0.879353 0.045079


I would like to combine these DataFrames to produce:

foo bar alpha beta
A B C D
one 1 0.755296 0.101329 NaN NaN
two 2 0.925653 0.587948 NaN NaN
three 3 NaN NaN 0.751013 0.957824
four 4 NaN NaN 0.879353 0.045079


When I try concat, the order of the indices are preserved, but not of the columns:

df_joined = pd.concat([df1,df2])
print(df_joined)
alpha bar beta foo
C B D A
one 1 NaN 0.101329 NaN 0.755296
two 2 NaN 0.587948 NaN 0.925653
three 3 0.751013 NaN 0.957824 NaN
four 4 0.879353 NaN 0.045079 NaN


When I try join, the order of the columns are preserved, but not of the indices:

df_joined = df1.join(df2, how = 'outer')
print(df_joined)
foo bar alpha beta
A B C D
four 4 NaN NaN 0.879353 0.045079
one 1 0.755296 0.101329 NaN NaN
three 3 NaN NaN 0.751013 0.957824
two 2 0.925653 0.587948 NaN NaN


How can I preserve the order of both columns and indices when combining DataFrames?

Please note: this is sample data. My real world data does not have convenient labels (e.g. 1,2,3,4) to sort on.

Answer

You can use hack - first concat and get Multiindex and then reindex output of second concat:

idx = pd.concat([df1,df2]).index
df_joined = pd.concat([df1,df2], axis=1).reindex(idx)
print (df_joined)
              foo       bar     alpha      beta
                A         B         C         D
one   0  0.269298  0.819375       NaN       NaN
two   1  0.574702  0.798920       NaN       NaN
three 3       NaN       NaN  0.436893  0.822041
four  4       NaN       NaN  0.757332  0.271900

Faster solution with creating DataFrames with Multiindexes, concat it and get index:

idx = pd.concat([pd.DataFrame(df1.index, index=df1.index),
                 pd.DataFrame(df2.index, index=df2.index)]).index
df_joined = pd.concat([df1,df2], axis=1).reindex(idx)
print (df_joined)
              foo       bar     alpha      beta
                A         B         C         D
one   0  0.007644  0.341335       NaN       NaN
two   1  0.332005  0.449688       NaN       NaN
three 3       NaN       NaN  0.281876  0.883299
four  4       NaN       NaN  0.880252  0.061797
Comments