luca luca - 3 months ago 17
Python Question

DataFrame: add column whose values are the quantile number/rank of an existing column?

I have a DataFrame with some columns. I'd like to add a new column where each row value is the quantile rank of one existing column.

I can use DataFrame.rank to rank a column, but then I don't know how to get the quantile number of this ranked value and to add this quantile number as a new colunm.

Example: if this is my DataFrame

df = pd.DataFrame(np.array([[1, 1], [2, 10], [3, 100], [4, 100]]), columns=['a', 'b'])

a b
0 1 1
1 2 10
2 3 100
3 4 100


and I'd like to know the quantile number (using 2 quantiles) of column b. I'd expect this result:

a b quantile
0 1 1 1
1 2 10 1
2 3 100 2
3 4 100 2

Answer

I discovered it is quite easy:

df['quantile'] = pd.qcut(df['b'], 2, labels=False)

   a    b  quantile
0  1    1         0
1  2   10         0
2  3  100         1
3  4  100         1

Interesting to know "difference between pandas.qcut and pandas.cut"

Comments