fukiburi fukiburi - 29 days ago 4
Python Question

Appending two DataFrames and sorting columns with exception of first two

I'd like to concatenate two data frames, created from two lists:

import pandas as pd
import numpy as np

header_1 = ['A', 'B', -1, 3, 5, 7]
data_1 = ['X', 'Y', 1, 2, 3, 4]
d = pd.DataFrame(np.array([data_1]), columns=header_1)

header_2 = ['A', 'B', -2, 4, 5, 6]
data_2 = ['X', 'Z', 1, 2, 3, 4]
e = pd.DataFrame(np.array([data_2]), columns=header_2)

f = pd.concat([d, e])

> f
A B -1 3 5 7 -2 4 6
0 X Y 1 2 3 4 NaN NaN NaN
0 X Z NaN NaN 3 NaN 1 2 4


However, I want my numerical columns to appear in sorted order and was wondering if there an easier way than splitting off the first two columns, sorting the remaining dataframe and concatenating the two again:

ab_cols = f[['A', 'B']] # Copy of first two columns
g = f.drop(['A', 'B'], axis=1) # Removing cols from dataframe
h = g.sort_index(axis=1) # Sort remaining by column header
i = pd.concat([ab_cols, h], axis=1) # Putting everything together again

> i
A B -2 -1 3 4 5 6 7
0 X Y NaN 1 2 NaN 3 NaN 4
0 X Z 1 NaN NaN 2 3 4 NaN


I've thought about multi-indices, but I'm already using the index for something else (source of data row, not shown here), and I'm afraid a three-level multi-index might make it more complicated for slicing the dataframe later.

Answer

Steps:

Make columns as a series representation with both index and values equal to the index keys.

Using pd.to_numeric with errors=coerce, to properly parse numeric values and handling string values as Nans.

Sort these values with pushing Nans(which were string values before) on top as and when they are encountered.

Taking their corresponding indices and re-arranging the DF based on these newly returned column labels.

c = pd.to_numeric(f.columns.to_series(), errors='coerce').sort_values(na_position='first')
f[c.index]

Image