user1700890 user1700890 - 9 days ago 8
Python Question

Pandas - assign histogram bucket to each row

Here is my dataframe:

import pandas as pd
df = pd.DataFrame({'A': [1, 2, 3, 4, 6, 4, 3, 2, 7]})
buckets = [(0,3),(3,5),(5,9)]


I also have histogram buckets stated above. Now I would like to assign each row of dataframe to buckets index. So I would like to get new column with the following info:

df['buckets_index'] = [0,0,0,1,2,1,0,0,2]


Of course, I can do it with loops, but I have fairly big dataframe (2.5 mil rows), so I need to get it done quickly.

Any thoughts?

Answer

You can use pd.cut, with labels=False if you only want the index:

buckets = [0,3,5,9]
df['bucket'] = pd.cut(df['A'], bins=buckets)
df['bucket_idx'] = pd.cut(df['A'], bins=buckets, labels=False)

The resulting output:

   A  bucket  bucket_idx
0  1  (0, 3]           0
1  2  (0, 3]           0
2  3  (0, 3]           0
3  4  (3, 5]           1
4  6  (5, 9]           2
5  4  (3, 5]           1
6  3  (0, 3]           0
7  2  (0, 3]           0
8  7  (5, 9]           2
Comments