alphanumeric alphanumeric - 1 month ago 9
Python Question

How to get integers instead of floats from DataFrame's rank method

To substitute the numbers with their corresponding "ranks":

import pandas as pd
import numpy as np

numbers = np.random.random_integers(low=0.0, high=10000.0, size=(1000,))
df = pd.DataFrame({'a': numbers})
df['a_rank'] = df['a'].rank()


I am getting the float values as the default output type of
rank
method:

987 82.0
988 36.5
989 526.0
990 219.0
991 957.0
992 819.5
993 787.5
994 513.0


Instead of
floats
I would rather have the integers. Rounding the resulted
float
values using
asType(int)
would be risky since converting to
int
would probably introduce the duplicated values from the
float
values that are too close to each other such as
3.5
and
4.0
. Those when converted to the integers both would result to the integer value of
4
.

Is there any way to guide
rank
method to output the integers?

Answer

Pass param method='dense', this will increase the ranks by 1 between groups, see the docs:

In [2]:

numbers = np.random.random_integers(low=0.0, high=10000.0, size=(1000,)) 
df = pd.DataFrame({'a': numbers})
df['a_rank'] = df['a'].rank(method='dense')
df
Out[2]:
        a  a_rank
0    1095     114
1    2514     248
2     500      53
3    6112     592
4    5582     533
5     851      91
6    2887     287
7    3798     366
8    4698     458
9    1699     170
10   4739     462
11   7199     693
12    817      88
13   3801     367
14   5584     534
15   4939     481
16   2569     258
17   6806     656
18     93       8
19   8574     816
20   4107     396
21   7086     684
22   6819     657
23   8844     847
24    170      15
25   6629     634
26   9905     950
27   5312     512
28   3794     365
29   9476     911
..    ...     ...
970  4607     447
971  8430     801
972  6527     625
973  2794     280
974  4414     425
975  1069     111
976  2849     285
977  7955     759
978  5767     547
979  7767     742
980  2956     294
981  5847     554
982  1029     107
983  4967     485
984   256      25
985  5577     532
986  6866     662
987  5903     563
988  1785     181
989   749      78
990  2164     212
991  1074     112
992  8752     837
993  2737     272
994  2761     277
995  7355     705
996  8956     857
997  4831     473
998   222      21
999  9531     917

[1000 rows x 2 columns]