vikky - 9 months ago 107
Python Question

# PCA For categorical features?

In my understanding, I thought PCA can be performed only for continuous features. But while trying to understand the difference between onehot encoding and label encoding came through a post in the following link:

``````http://datascience.stackexchange.com/questions/9443/when-to-use-one-hot-encoding-vs-labelencoder-vs-dictvectorizor
``````

It states that one hot encoding followed by PCA is a very good method, which basically means PCA is applied for categorical features.
Hence confused, please suggest me on the same.

PCA is a dimensionality reduction method that can be applied any set of features. Here is an example using OneHotEncoded (i.e. categorical) data:

``````from sklearn.preprocessing import OneHotEncoder
enc = OneHotEncoder()
X = enc.fit_transform([[0, 0, 3], [1, 1, 0], [0, 2, 1], [1, 0, 2]]).toarray()

print(X)

> array([[ 1.,  0.,  1.,  0.,  0.,  0.,  0.,  0.,  1.],
[ 0.,  1.,  0.,  1.,  0.,  1.,  0.,  0.,  0.],
[ 1.,  0.,  0.,  0.,  1.,  0.,  1.,  0.,  0.],
[ 0.,  1.,  1.,  0.,  0.,  0.,  0.,  1.,  0.]])

from sklearn.decomposition import PCA
pca = PCA(n_components=3)
X_pca = pca.fit_transform(X)

print(X_pca)

> array([[-0.70710678,  0.79056942,  0.70710678],
[ 1.14412281, -0.79056942,  0.43701602],
[-1.14412281, -0.79056942, -0.43701602],
[ 0.70710678,  0.79056942, -0.70710678]])
``````