daloman daloman - 4 days ago 6
R Question

Visualize large dimension clusters in R using k-means

I have a dataset with several rows and 5 dimensions (all numeric). After normalization, I applied the k-means algorithm in order to clusterize the data.

clus2_k3<-kmeans(clus2,centers=3)


After this step I would like to visualize the result, but as it has more than 3 dimensions it is not possible to use 2D or 3D plot.

Is there any command or algorithm to plot it, or if not, an alternative way to reduce the number of dimensions without losing the information from the substracted ones?

Answer

Okay, it is completely unreadable as a comment...

require(ggplot2)
data("iris")

pca_res <- prcomp(as.matrix(iris[, 1:4]), center = TRUE, scale. = TRUE)
plot_data <- cbind(as.data.frame(pca_res$x[, 1:2]), labels = iris[, 5])

ggplot(plot_data, aes(x = PC1, y = PC2, colour = labels)) +
  geom_point()

enter image description here

Edit: You may try different combinations of center and scale. params, as e.g. this set looks a bit better with both set to FALSE:

enter image description here

Edit:

To see the "loss of information" mentioned in the comment one may use the summary() function:

summary(pca_res)

# Importance of components:
#                           PC1    PC2     PC3     PC4
# Standard deviation     1.7084 0.9560 0.38309 0.14393
# Proportion of Variance 0.7296 0.2285 0.03669 0.00518
# Cumulative Proportion  0.7296 0.9581 0.99482 1.00000

Here PC1 and PC2 stand for 0.96 of cumulative prop. of variance, which means 96% of "information" is stored in those two components.

Comments