Lin Ma - 1 year ago 87

Python Question

`LabelEncoder`

`OneHotEncoder`

`0,1`

My question is, is there a neat API to convert a column of a pandas data frame into

`0, 1`

`123.csv`

`0, 1`

`c_a`

`c_b`

`c_c`

`0, 1`

Code,

`import pandas as pd`

sample=pd.read_csv('123.csv', sep=',',header=None)

print sample.dtypes

123.csv content,

`c_a,c_b,c_c,c_d`

hello,python,pandas,1.2

hi,c++,vector,1.2

Label Encoder and OneHotEncoder examples for numpy,

`from sklearn.preprocessing import LabelEncoder`

from sklearn.preprocessing import OneHotEncoder

S = np.array(['b','a','c'])

le = LabelEncoder()

S = le.fit_transform(S)

print(S)

ohe = OneHotEncoder()

one_hot = ohe.fit_transform(S.reshape(-1,1)).toarray()

print(one_hot)

which results in:

[1 0 2]

[[ 0. 1. 0.]

[ 1. 0. 0.]

[ 0. 0. 1.]]

`get_dummies`

`0.0`

`1.0`

`float`

`0_c_a 0_hello 0_hi 0_ho 1_c++ 1_c_b 1_java 1_python 2_c_c 2_numpy \`

0 1.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 1.0 0.0

1 0.0 1.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0

2 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0

3 0.0 0.0 0.0 1.0 1.0 0.0 0.0 0.0 0.0 1.0

Answer Source

Are you looking for `get_dummies`

?

```
s = pd.Series(["a", "b", "a", "c"])
pd.get_dummies(s)
```

If you want `ints`

:

```
pd.get_dummies(s).astype(np.uint8)
```

reference:

Pandas get_dummies to output dtype integer/bool instead of float