piRSquared piRSquared - 2 months ago 14
Python Question

twist dataframe by rank

consider the dataframe

df


np.random.seed([3,1415])
df = pd.DataFrame(np.random.rand(4, 5), columns=list('ABCDE'))
df


enter image description here




I want a dataframe where the columns are ranks and each row is
['A', 'B', 'C', 'D', 'E']
in rank order.

ranks

df.rank(1).astype(int)


enter image description here




expected results

enter image description here

Answer

Here's one way:

In [90]: df
Out[90]: 
          A         B         C         D         E
0  0.444939  0.407554  0.460148  0.465239  0.462691
1  0.016545  0.850445  0.817744  0.777962  0.757983
2  0.934829  0.831104  0.879891  0.926879  0.721535
3  0.117642  0.145906  0.199844  0.437564  0.100702

In [91]: df2 = df.apply(lambda row: df.columns[np.argsort(row)], axis=1)

In [92]: df2
Out[92]: 
   A  B  C  D  E
0  B  A  C  E  D
1  A  E  D  C  B
2  E  B  C  D  A
3  E  A  B  C  D

The new DataFrame has the same column index as df, but that can be fixed:

In [93]: df2.columns = range(1, 1 + df2.shape[1])

In [94]: df2
Out[94]: 
   1  2  3  4  5
0  B  A  C  E  D
1  A  E  D  C  B
2  E  B  C  D  A
3  E  A  B  C  D

Here's another way. This one converts the DataFrame to a numpy array, applies argsort on axis 1, uses that to index df.columns, and puts the result back into a DataFrame.

In [110]: pd.DataFrame(df.columns[np.array(df).argsort(axis=1)], columns=range(1, 1 + df.shape[1]))
Out[110]: 
   1  2  3  4  5
0  B  A  C  E  D
1  A  E  D  C  B
2  E  B  C  D  A
3  E  A  B  C  D
Comments