HP Peng HP Peng - 6 months ago 57
Python Question

What is wrong with this pandas dataframe.apply(lambda)?

I cannot understand why this code does not work. I am practicing df.apply with lambda. I expect the output to be a sorted df. Thanks.

import pandas as pd
import numpy as np

data = np.random.randn(10,5)
col = list('ABCDE') # assign column names


I want to create a new dataframe t that is a sorted df

df = pd.DataFrame(data, columns = col)
t = df.apply(lambda x: x.sort_values())

>>> df
A B C D E
0 1.548097 0.682373 -1.254562 -0.249815 0.002013
1 -2.581173 0.946034 -1.389210 -0.877128 -1.569914
2 -0.980636 1.555700 -1.346029 0.180983 1.112470
3 0.724657 0.520718 0.122696 1.386643 0.060714
4 -0.119740 -0.665260 -1.085457 0.699085 1.149364
5 -0.004628 -0.479672 -0.641696 0.875471 0.826836
6 0.598497 -0.018560 -1.002511 0.478659 0.463565
7 -0.005159 -0.137165 -0.460209 0.284940 0.755981
8 0.576421 0.098833 -2.664028 0.118074 -0.426393
9 -0.223696 -0.589748 -0.733454 -0.254564 -0.519015

>>> t
A B C D E
0 1.548097 0.682373 -1.254562 -0.249815 0.002013
1 -2.581173 0.946034 -1.389210 -0.877128 -1.569914
2 -0.980636 1.555700 -1.346029 0.180983 1.112470
3 0.724657 0.520718 0.122696 1.386643 0.060714
4 -0.119740 -0.665260 -1.085457 0.699085 1.149364
5 -0.004628 -0.479672 -0.641696 0.875471 0.826836
6 0.598497 -0.018560 -1.002511 0.478659 0.463565
7 -0.005159 -0.137165 -0.460209 0.284940 0.755981
8 0.576421 0.098833 -2.664028 0.118074 -0.426393
9 -0.223696 -0.589748 -0.733454 -0.254564 -0.519015

Answer

I'm assuming your question is: Why doesn't your code do what you wanted?

the .apply(lambda x: x.sort_values()) does do what you want. However, the return type of this lambda is a pd.Series and it has an index that is preserved. When recombining all columns, pandas ensures the indices are lined up, thus removing the sort you just did. In order to break that behavior, get at the values (without the index)

df.apply(lambda x: x.sort_values().values)
Comments