Jamgreen - 9 months ago 52

Python Question

I have a data frame

`df`

`mpg`

I want to add class label/names to each row based on the value of

`mpg`

I have done it with

`mpg = df.iloc[:,0]`

median = np.percentile(mpg, q=50)

upper_quartile = np.percentile(mpg, q=75)

lower_quartile = np.percentile(mpg, q=25)

mpg_class = np.ones((num_observations, 1))

for i, element in enumerate(X):

mpg = element[0]

if mpg >= upper_quartile:

mpg_class[i] = 3

elif mpg >= median:

mpg_class[i] = 2

elif mpg >= lower_quartile:

mpg_class[i] = 1

else:

mpg_class[i] = 0

but I wonder if it's possible to do way smarter with

`numpy`

`np.where`

Answer

Seems like you are looking for pd.qcut:

```
pd.qcut(df.iloc[:, 0], [0, 0.25, 0.5, 0.75, 1], [0, 1, 2, 3])
Out:
0 1
1 0
2 1
3 0
4 0
5 0
6 0
...
```

The first parameter is the series you want to discretize. The second is the quantiles/percentiles. The last one is the labels (from 0 to 25% - 0, 25% to 50% - 1, etc.)

Source (Stackoverflow)