slaw - 5 months ago 32

Python Question

I have some artist names in

`data['artist']`

`x = data['artist'].astype('category').cat.codes`

x.dtype

Returns:

`dtype('int32')`

I am getting negative numbers which suggests some sort of overflow situation. So, I'd like to use

`np.int64`

`x = data['artist'].astype('category').cat.codes.astype(np.int64)`

x.dtype

Gives

`dtype('int64')`

but it is clear that the int32 gets converted to int64 and so the negative value is still present

`x = data['artist'].astype('category').cat.codes.astype(np.int64)`

x.min()

-1

Answer

I think you have `NaN`

in column `artist`

, so code is `-1`

:

```
data=pd.DataFrame({'artist':[np.nan,'y','z','x','y','z']})
x = data['artist'].astype('category').cat.codes
print x
0 -1
1 1
2 2
3 0
4 1
5 2
dtype: int8
```

For checking `NaN`

you can use `isnull`

:

```
print data[data.artist.isnull()]
artist
0 NaN
```

Source (Stackoverflow)

Comments