daloman - 1 year ago 86
R Question

# Visualize large dimension clusters in R using k-means

I have a dataset with several rows and 5 dimensions (all numeric). After normalization, I applied the k-means algorithm in order to clusterize the data.

``````clus2_k3<-kmeans(clus2,centers=3)
``````

After this step I would like to visualize the result, but as it has more than 3 dimensions it is not possible to use 2D or 3D plot.

Is there any command or algorithm to plot it, or if not, an alternative way to reduce the number of dimensions without losing the information from the substracted ones?

Okay, it is completely unreadable as a comment...

``````require(ggplot2)
data("iris")

pca_res <- prcomp(as.matrix(iris[, 1:4]), center = TRUE, scale. = TRUE)
plot_data <- cbind(as.data.frame(pca_res\$x[, 1:2]), labels = iris[, 5])

ggplot(plot_data, aes(x = PC1, y = PC2, colour = labels)) +
geom_point()
``````

Edit: You may try different combinations of `center` and `scale.` params, as e.g. this set looks a bit better with both set to `FALSE`:

### Edit:

To see the "loss of information" mentioned in the comment one may use the `summary()` function:

``````summary(pca_res)

# Importance of components:
#                           PC1    PC2     PC3     PC4
# Standard deviation     1.7084 0.9560 0.38309 0.14393
# Proportion of Variance 0.7296 0.2285 0.03669 0.00518
# Cumulative Proportion  0.7296 0.9581 0.99482 1.00000
``````

Here PC1 and PC2 stand for 0.96 of cumulative prop. of variance, which means 96% of "information" is stored in those two components.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download