daign - 6 months ago 50

Python Question

I have a one-dimensional numpy array with numbers, and I want each number replaced with the index of the quantile it belongs to.

This is my code for quintile indices:

`import numpy as np`

def get_quintile_indices( a ):

result = np.ones( a.shape[ 0 ] ) * 4

quintiles = [

np.percentile( a, 20 ),

np.percentile( a, 40 ),

np.percentile( a, 60 ),

np.percentile( a, 80 )

]

for q in quintiles:

result -= np.less_equal( a, q ) * 1

return result

a = np.array( [ 58, 54, 98, 76, 35, 13, 62, 18, 62, 97, 44, 43 ] )

print get_quintile_indices( a )

Output:

`[ 2. 2. 4. 4. 0. 0. 3. 0. 3. 4. 1. 1.]`

You see I start with an array initialized with the highest possible index and for every quintile cutpoint substract 1 from each entry that is less or equal than the quintile cutpoint. Is there a better way to do this? A build-in function that can be used to map numbers against a list of cutpoints?

Answer

First off, we can generate those `quintiles`

in one go -

```
quintiles = np.percentile( a, [20,40,60,80] )
```

For the final step to get the offsets, we can simply use `np.searchsorted`

and this might be the built-in you were looking for, like so -

```
out = quintiles.searchsorted(a)
```

Alternatively, a direct translation of your loopy code to a vectorized version would be with `broadcasting`

, like so -

```
# Use broadcasting to perform those comparisons in one go.
# Then, simply sum along the first axis and subtract from 4.
out = 4 - (quintiles[:,None] >= a).sum(0)
```

Source (Stackoverflow)