piRSquared - 1 year ago 122
Python Question

# twist dataframe by rank

consider the dataframe

`df`

``````np.random.seed([3,1415])
df = pd.DataFrame(np.random.rand(4, 5), columns=list('ABCDE'))
df
``````

I want a dataframe where the columns are ranks and each row is
`['A', 'B', 'C', 'D', 'E']`
in rank order.

ranks

``````df.rank(1).astype(int)
``````

expected results

Here's one way:

``````In [90]: df
Out[90]:
A         B         C         D         E
0  0.444939  0.407554  0.460148  0.465239  0.462691
1  0.016545  0.850445  0.817744  0.777962  0.757983
2  0.934829  0.831104  0.879891  0.926879  0.721535
3  0.117642  0.145906  0.199844  0.437564  0.100702

In [91]: df2 = df.apply(lambda row: df.columns[np.argsort(row)], axis=1)

In [92]: df2
Out[92]:
A  B  C  D  E
0  B  A  C  E  D
1  A  E  D  C  B
2  E  B  C  D  A
3  E  A  B  C  D
``````

The new DataFrame has the same column index as `df`, but that can be fixed:

``````In [93]: df2.columns = range(1, 1 + df2.shape[1])

In [94]: df2
Out[94]:
1  2  3  4  5
0  B  A  C  E  D
1  A  E  D  C  B
2  E  B  C  D  A
3  E  A  B  C  D
``````

Here's another way. This one converts the DataFrame to a numpy array, applies `argsort` on axis 1, uses that to index `df.columns`, and puts the result back into a DataFrame.

``````In [110]: pd.DataFrame(df.columns[np.array(df).argsort(axis=1)], columns=range(1, 1 + df.shape[1]))
Out[110]:
1  2  3  4  5
0  B  A  C  E  D
1  A  E  D  C  B
2  E  B  C  D  A
3  E  A  B  C  D
``````
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download