AGS AGS - 4 months ago 7
Python Question

Sort by one column's values, keeping rows grouped by another column's value

I have two (hundreds) df's that are generated and then concatenated, which I would then like to sort while keeping the rows with identical column

D
names in the original order:

In [120]: df_list[0]
Out[120]:

A B C D
0 0.564678 0.598355 0.606693 MA0835
1 0.066291 0.063587 0.662292 MA0835
2 0.000000 0.000000 0.010758 MA0835
3 0.000000 0.000000 0.097895 MA0835
4 0.000000 0.000000 0.136468 MA0835

In [121]: df_list[1]
Out[121]:

A B C D
0 0.628844 0.614492 0.570333 MA1002
1 0.317790 0.293189 0.239368 MA1002
2 0.000000 0.000000 0.000000 MA1002
3 0.000000 0.000000 0.000000 MA1002
4 0.000000 0.000000 0.000000 MA1002

In [122]: df = pd.concat(df_list[0:2])

In [122]: df
Out[122]:
A B C D
0 0.564678 0.598355 0.606693 MA0835
1 0.066291 0.063587 0.662292 MA0835
2 0.000000 0.000000 0.010758 MA0835
3 0.000000 0.000000 0.097895 MA0835
4 0.000000 0.000000 0.136468 MA0835
0 0.628844 0.614492 0.570333 MA1002
1 0.317790 0.293189 0.239368 MA1002
2 0.000000 0.000000 0.000000 MA1002
3 0.000000 0.000000 0.000000 MA1002
4 0.000000 0.000000 0.000000 MA1002


Standard sorting produces:

In [125]: df.sort_values('A',ascending=False)
Out[125]:
A B C D
0 0.628844 0.614492 0.570333 MA1002
0 0.564678 0.598355 0.606693 MA0835
1 0.317790 0.293189 0.239368 MA1002
1 0.066291 0.063587 0.662292 MA0835
2 0.000000 0.000000 0.010758 MA0835
3 0.000000 0.000000 0.097895 MA0835
4 0.000000 0.000000 0.136468 MA0835
2 0.000000 0.000000 0.000000 MA1002
3 0.000000 0.000000 0.000000 MA1002
4 0.000000 0.000000 0.000000 MA1002


However, I would like to sort on
A
and keep the row-groupings as specified by
D
. This is the desired output:

A B C D
0 0.628844 0.614492 0.570333 MA1002
1 0.317790 0.293189 0.239368 MA1002
2 0.000000 0.000000 0.000000 MA1002
3 0.000000 0.000000 0.000000 MA1002
4 0.000000 0.000000 0.000000 MA1002
0 0.564678 0.598355 0.606693 MA0835
1 0.066291 0.063587 0.662292 MA0835
2 0.000000 0.000000 0.010758 MA0835
3 0.000000 0.000000 0.097895 MA0835
4 0.000000 0.000000 0.136468 MA0835


Do I need to work with
groupby
, or is there another sorting/grouping technique I am unfamiliar with?

Answer

Use the keys argument in pd.concat

keys = [(df.A.iloc[0], i) for i, df in enumerate(list_of_dfs)]
pd.concat(list_of_dfs, keys=keys) \
    .sort_index(ascending=[False, True, True]) \
    .reset_index(drop=True)

enter image description here