tnabdb - 1 year ago 57

Python Question

Suppose I have a numpy array like: [11, 30, 25]. These numbers represent categories of the objects corresponding to the indices. I know there are just 20 categories but for some reason they are numbered from 11 to 29. I'd like to convert them to numbers in 0:19 and back. What would by a pythonic way to do this? Preferably in bumpy.

EDIT: this is just a small example of a bigger problem, where the number of categories are in the thousands, and some categories are never represented, so the maximum id will be the number of unique existing categories.

Answer Source

To be able to easily convert back-and-forth, I would use the `sklearn.preprocessing`

module `LabelEncoder`

:

```
In [7]: from sklearn.preprocessing import LabelEncoder
In [8]: encoder = LabelEncoder()
In [9]: encoder.fit(range(11,31))
Out[9]: LabelEncoder()
In [10]: encoder.transform([11,30,25])
Out[10]: array([ 0, 19, 14])
In [11]: encoder.inverse_transform([18, 1, 15])
Out[11]: array([29, 12, 26])
```