Lin Ma Lin Ma - 3 months ago 73
Python Question

OneHotEncoder confusion in scikit learn

Using in Python 2.7 (miniconda interpreter). Confused by the example below about

OneHotEncoder
, confused why
enc.n_values_
output is
[2, 3, 4]
? If anyone could help to clarify, it will be great.

http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.OneHotEncoder.html

>>> from sklearn.preprocessing import OneHotEncoder
>>> enc = OneHotEncoder()
>>> enc.fit([[0, 0, 3], [1, 1, 0], [0, 2, 1], [1, 0, 2]])
OneHotEncoder(categorical_features='all', dtype=<... 'float'>,
handle_unknown='error', n_values='auto', sparse=True)
>>> enc.n_values_
array([2, 3, 4])
>>> enc.feature_indices_
array([0, 2, 5, 9])
>>> enc.transform([[0, 1, 1]]).toarray()
array([[ 1., 0., 0., 1., 0., 0., 1., 0., 0.]])


regards,
Lin

Answer

n_values is the number of values per feature.

In this example,

X = 0 0 3
    1 1 0
    0 2 1
    1 0 2

(X's shape is [n_samples, n_feature])

For the first feature, there are 2 values: 0, 1;

For the second feature, there are 3 values: 0, 1, 2.

For the third feature, there are 4 values: 0, 1, 2, 3.

Therefore, enc.n_values_ is [2, 3, 4].