ihsansat ihsansat - 7 months ago 57
Python Question

How transform value column to quantile at pandas python?

I use pandas to analyze my data, and execute:

df = pd.DataFrame(datas, columns=['userid', 'recency', 'frequency', 'monetary'])


print df
userid recency frequency monetary
0 47918 9 53 788778
1 48302 85 10 232323
2 8873 3 79 2323
3 63158 23 23 2323232
4 364 14 43 232323
5 45191 1 75 224455
6 21061 9 64 23367
7 41356 22 55 2346777
8 42455 14 30 23478
9 65460 3 16 2345


I need to transform value
recency
frequency
and
monetary
into value in range 1-5. so output is

userid recency frequency monetary
0 47918 1 2 3
1 48302 2 1 2
2 8873 3 4 5
3 63158 2 2 2
4 364 5 4 2
5 45191 1 5 4
6 21061 4 4 3
7 41356 3 5 4
8 42455 5 3 5
9 65460 3 1 2


how can do that in python ?

thx

Answer

IIUC you need qcut with codes, last need add 1, because minimal value is 1 and maximal 5:

df['recency1'] = pd.qcut(df['recency'].values, 5)
df['frequency1'] = pd.qcut(df['frequency'].values, 5)
df['monetary1'] = pd.qcut(df['monetary'].values, 5)
print df
   userid  recency  frequency  monetary    recency1    frequency1  \
0   47918        9         53    788778      (3, 9]  (37.8, 53.8]   
1   48302       85         10    232323  (22.2, 85]    [10, 21.6]   
2    8873        3         79      2323      [1, 3]    (66.2, 79]   
3   63158       23         23   2323232  (22.2, 85]  (21.6, 37.8]   
4     364       14         43    232323     (9, 14]  (37.8, 53.8]   
5   45191        1         75    224455      [1, 3]    (66.2, 79]   
6   21061        9         64     23367      (3, 9]  (53.8, 66.2]   
7   41356       22         55   2346777  (14, 22.2]  (53.8, 66.2]   
8   42455       14         30     23478     (9, 14]  (21.6, 37.8]   
9   65460        3         16      2345      [1, 3]    [10, 21.6]   

              monetary1  
0   (232323, 1095668.8]  
1    (144064.2, 232323]  
2       [2323, 19162.6]  
3  (1095668.8, 2346777]  
4    (144064.2, 232323]  
5    (144064.2, 232323]  
6   (19162.6, 144064.2]  
7  (1095668.8, 2346777]  
8   (19162.6, 144064.2]  
9       [2323, 19162.6]  
df['recency'] = pd.qcut(df['recency'].values, 5).codes + 1
df['frequency'] = pd.qcut(df['frequency'].values, 5).codes + 1
df['monetary'] = pd.qcut(df['monetary'].values, 5).codes + 1
print df
   userid  recency  frequency  monetary
0   47918        2          3         4
1   48302        5          1         3
2    8873        1          5         1
3   63158        5          2         5
4     364        3          3         3
5   45191        1          5         3
6   21061        2          4         2
7   41356        4          4         5
8   42455        3          2         2
9   65460        1          1         1