daloman - 7 months ago 54

R Question

I have a dataset with several rows and 5 dimensions (all numeric). After normalization, I applied the k-means algorithm in order to clusterize the data.

`clus2_k3<-kmeans(clus2,centers=3)`

After this step I would like to visualize the result, but as it has more than 3 dimensions it is not possible to use 2D or 3D plot.

Is there any command or algorithm to plot it, or if not, an alternative way to reduce the number of dimensions without losing the information from the substracted ones?

Answer

Okay, it is completely unreadable as a comment...

```
require(ggplot2)
data("iris")
pca_res <- prcomp(as.matrix(iris[, 1:4]), center = TRUE, scale. = TRUE)
plot_data <- cbind(as.data.frame(pca_res$x[, 1:2]), labels = iris[, 5])
ggplot(plot_data, aes(x = PC1, y = PC2, colour = labels)) +
geom_point()
```

Edit: You may try different combinations of `center`

and `scale.`

params, as e.g. this set looks a bit better with both set to `FALSE`

:

To see the "loss of information" mentioned in the comment one may use the `summary()`

function:

```
summary(pca_res)
# Importance of components:
# PC1 PC2 PC3 PC4
# Standard deviation 1.7084 0.9560 0.38309 0.14393
# Proportion of Variance 0.7296 0.2285 0.03669 0.00518
# Cumulative Proportion 0.7296 0.9581 0.99482 1.00000
```

Here PC1 and PC2 stand for 0.96 of cumulative prop. of variance, which means 96% of "information" is stored in those two components.

Source (Stackoverflow)