Sabor Sabor - 8 months ago 56
Python Question

Map discrete columns to the index of their unique values

I have a dataframe with an int columns:

df=pd.DataFrame(data=2*np.random.randint(0,high=10,size=5),columns=['N'])

N
0 8
1 4
2 8
3 14
4 2
5 18


I would like to generate another dataframe as:

df2=

N ID
0 8 2
1 4 1
2 8 2
3 14 3
4 2 0
5 18 4


where
ID
is the index of the sorted list of unique values in N

I would need a computationally cheap solution as it needs to run on large dataframes and be updated very often.

Answer Source

Use rank + sub + astype:

df['ID'] = df['N'].rank(method='dense').sub(1).astype(int)
print (df)
    N  ID
0   8   2
1   4   1
2   8   2
3  14   3
4   2   0
5  18   4
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download