sanaz - 9 months ago 67

Python Question

I have the following dataframe:

`id p1 p2 p3 p4`

1 0 9 1 4

2 0 2 3 4

3 1 3 10 7

4 1 5 3 1

5 2 3 7 10

I need to reshape the data frame in a way that for each id it will have the top 3 columns with the highest values. The result would be like this:

`id top1 top2 top3`

1 p2 p4 p3

2 p4 p3 p2

3 p3 p4 p2

4 p2 p3 p4/p1

5 p4 p3 p2

It shows the top 3 best sellers for every

`user_id`

`dplyr`

Answer Source

You could use `np.argsort`

to find the indices of the *n* largest items for each row:

```
import numpy as np
import pandas as pd
df = pd.DataFrame({'id': [1, 2, 3, 4, 5],
'p1': [0, 0, 1, 1, 2],
'p2': [9, 2, 3, 5, 3],
'p3': [1, 3, 10, 3, 7],
'p4': [4, 4, 7, 1, 10]})
df = df.set_index('id')
nlargest = 3
order = np.argsort(-df.values, axis=1)[:, :nlargest]
result = pd.DataFrame(df.columns[order],
columns=['top{}'.format(i) for i in range(1, nlargest+1)],
index=df.index)
print(result)
```

yields

```
top1 top2 top3
id
1 p2 p4 p3
2 p4 p3 p2
3 p3 p4 p2
4 p2 p3 p1
5 p4 p3 p2
```