neha - 4 years ago 76
Python Question

# calculating top 10 products for a given query based on its priority in python

suppose we are given a dataframe like:

``````                       Query  Productid  priority
index
0                        3ds    2125233  0.018946
1                        rca    2009324  0.027599
2                       nook    1517163  0.009443
3                        rca    2877125  0.012054
4                        rca    2877134  0.005557
5              flatscreentvs    2416092  0.011961
6                    macbook    3108172  0.010459
7                        3ds    2264036  0.165948
8                        rca    8280834  0.004006
9                 memorycard    2740208  0.013744
10               acpowercord    2584273  0.006865
11                zaggiphone    1230537  0.136073
12            watchthethrone    3168067  0.104679
13     remotecontrolextender    7997055  0.113058
14                 camcorder    2009041  0.017809
15                       3ds    1988047  0.031711
16                       3ds    1686079  0.043783
19                 samsung40    2126065  0.018066
``````

i want to find top 2
`product_ids`
on the basis of priority with respect to a given query.

for eg. if we have
`query=3ds`
then top 2 products should be:

``````1. 1988047
2. 1686079
``````

This would be an equivalent for Oracle's row_number() analytic function:

``````In [172]: df.assign(rn=df.sort_values('priority', ascending=0).groupby('Query').cumcount() + 1).query('rn < 3').sort_values(['Query','rn'])
Out[172]:
Query  Productid  priority  rn
index
7                        3ds    2264036  0.165948   1
16                       3ds    1686079  0.043783   2
10               acpowercord    2584273  0.006865   1
14                 camcorder    2009041  0.017809   1
5              flatscreentvs    2416092  0.011961   1
6                    macbook    3108172  0.010459   1
9                 memorycard    2740208  0.013744   1
2                       nook    1517163  0.009443   1
1                        rca    2009324  0.027599   1
3                        rca    2877125  0.012054   2
13     remotecontrolextender    7997055  0.113058   1
19                 samsung40    2126065  0.018066   1
12            watchthethrone    3168067  0.104679   1
11                zaggiphone    1230537  0.136073   1
``````

show `Productid` for selected `Query`:

``````In [180]: (df.assign(rn=df.sort_values('priority', ascending=0).groupby('Query').cumcount() + 1)
.....:    .query('Query=="3ds" and rn < 3')['Productid']
.....: )
Out[180]:
index
7     2264036
16    1686079
Name: Productid, dtype: int64
``````
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download