Ben Ben - 1 month ago 8
Python Question

How do I sort a dataframe's columns based on the values in multiple rows?

For example,

df = pd.DataFrame({'x':[1,1,1,2,3,3], 'y':['a','a','c','b','b','b']})
ct = pd.crosstab(df.x, df.y)
ct
y a b c
x
1 2 0 1
2 0 1 0
3 0 2 0


How do I sort the columns of ct based on the values in row1, row2, and row3 (in that order of priority)?

I've tried the following, neither of which work

ct.sort_values([1, 2, 3], axis=1)
ct.sort_values(['1','2','3'], axis=1)

Answer

This cannot currently be done with a direct call to sort_values. There is an open bug report about it.

You can still do it less nicely by transposing, sorting by columns, then transposing again:

>>> ct.T.sort_values([1, 2, 3]).T
y  b  c  a
x         
1  0  1  2
2  1  0  0
3  2  0  0