Jamgreen Jamgreen - 2 months ago 12
Python Question

Group data into classes with numpy

I have a data frame

df
from which I extract a column
mpg
.

I want to add class label/names to each row based on the value of
mpg
.

I have done it with

mpg = df.iloc[:,0]

median = np.percentile(mpg, q=50)
upper_quartile = np.percentile(mpg, q=75)
lower_quartile = np.percentile(mpg, q=25)

mpg_class = np.ones((num_observations, 1))

for i, element in enumerate(X):
mpg = element[0]
if mpg >= upper_quartile:
mpg_class[i] = 3
elif mpg >= median:
mpg_class[i] = 2
elif mpg >= lower_quartile:
mpg_class[i] = 1
else:
mpg_class[i] = 0


but I wonder if it's possible to do way smarter with
numpy
? I guess it might be possible to do it with
np.where
or something like this.

Answer

Seems like you are looking for pd.qcut:

pd.qcut(df.iloc[:, 0], [0, 0.25, 0.5, 0.75, 1], [0, 1, 2, 3])
Out: 
0      1
1      0
2      1
3      0
4      0
5      0
6      0
...

The first parameter is the series you want to discretize. The second is the quantiles/percentiles. The last one is the labels (from 0 to 25% - 0, 25% to 50% - 1, etc.)