I have a dataset with several rows and 5 dimensions (all numeric). After normalization, I applied the k-means algorithm in order to clusterize the data.
Okay, it is completely unreadable as a comment...
require(ggplot2) data("iris") pca_res <- prcomp(as.matrix(iris[, 1:4]), center = TRUE, scale. = TRUE) plot_data <- cbind(as.data.frame(pca_res$x[, 1:2]), labels = iris[, 5]) ggplot(plot_data, aes(x = PC1, y = PC2, colour = labels)) + geom_point()
Edit: You may try different combinations of
scale. params, as e.g. this set looks a bit better with both set to
To see the "loss of information" mentioned in the comment one may use the
summary(pca_res) # Importance of components: # PC1 PC2 PC3 PC4 # Standard deviation 1.7084 0.9560 0.38309 0.14393 # Proportion of Variance 0.7296 0.2285 0.03669 0.00518 # Cumulative Proportion 0.7296 0.9581 0.99482 1.00000
Here PC1 and PC2 stand for 0.96 of cumulative prop. of variance, which means 96% of "information" is stored in those two components.