ihsansat - 5 months ago 34x
Python Question

# How transform value column to quantile at pandas python?

I use pandas to analyze my data, and execute:

df = pd.DataFrame(datas, columns=['userid', 'recency', 'frequency', 'monetary'])

print df
userid recency frequency monetary
0 47918 9 53 788778
1 48302 85 10 232323
2 8873 3 79 2323
3 63158 23 23 2323232
4 364 14 43 232323
5 45191 1 75 224455
6 21061 9 64 23367
7 41356 22 55 2346777
8 42455 14 30 23478
9 65460 3 16 2345

I need to transform value
recency
frequency
and
monetary
into value in range 1-5. so output is

userid recency frequency monetary
0 47918 1 2 3
1 48302 2 1 2
2 8873 3 4 5
3 63158 2 2 2
4 364 5 4 2
5 45191 1 5 4
6 21061 4 4 3
7 41356 3 5 4
8 42455 5 3 5
9 65460 3 1 2

how can do that in python ?

thx

IIUC you need qcut with codes, last need add 1, because minimal value is 1 and maximal 5:

df['recency1'] = pd.qcut(df['recency'].values, 5)
df['frequency1'] = pd.qcut(df['frequency'].values, 5)
df['monetary1'] = pd.qcut(df['monetary'].values, 5)
print df
userid  recency  frequency  monetary    recency1    frequency1  \
0   47918        9         53    788778      (3, 9]  (37.8, 53.8]
1   48302       85         10    232323  (22.2, 85]    [10, 21.6]
2    8873        3         79      2323      [1, 3]    (66.2, 79]
3   63158       23         23   2323232  (22.2, 85]  (21.6, 37.8]
4     364       14         43    232323     (9, 14]  (37.8, 53.8]
5   45191        1         75    224455      [1, 3]    (66.2, 79]
6   21061        9         64     23367      (3, 9]  (53.8, 66.2]
7   41356       22         55   2346777  (14, 22.2]  (53.8, 66.2]
8   42455       14         30     23478     (9, 14]  (21.6, 37.8]
9   65460        3         16      2345      [1, 3]    [10, 21.6]

monetary1
0   (232323, 1095668.8]
1    (144064.2, 232323]
2       [2323, 19162.6]
3  (1095668.8, 2346777]
4    (144064.2, 232323]
5    (144064.2, 232323]
6   (19162.6, 144064.2]
7  (1095668.8, 2346777]
8   (19162.6, 144064.2]
9       [2323, 19162.6]
df['recency'] = pd.qcut(df['recency'].values, 5).codes + 1
df['frequency'] = pd.qcut(df['frequency'].values, 5).codes + 1
df['monetary'] = pd.qcut(df['monetary'].values, 5).codes + 1
print df
userid  recency  frequency  monetary
0   47918        2          3         4
1   48302        5          1         3
2    8873        1          5         1
3   63158        5          2         5
4     364        3          3         3
5   45191        1          5         3
6   21061        2          4         2
7   41356        4          4         5
8   42455        3          2         2
9   65460        1          1         1