Luna - 7 months ago 46

R Question

I am exploring the iris data set on r studio and I would like some clarification on the following two codes:

`cluster_iris<-kmeans(iris[,1:4], centers=3)`

iris$ClusterM <- as.factor(cluster_iris$cluster)

I think the first one is performing a k-means cluster analysis using all the cases of the data file and only the first 4 columns with a choice of 3 clusters.

However I'm not sure what the second piece of code is doing? Is the first one just stating the preferences for the analysis and the second one actually executing it (i.e. performing the k-means)?

Any help is appreciated

Answer

The first line does the cluster analysis, and stores the cluster labels in a component called `cluster_iris$cluster`

which is just a vector of numbers.

The second line puts that cluster number as a categorical label onto the rows of the original data set. So now your iris data has all the petal and sepal stuff and a cluster index in a column called `"ClusterM"`

.

```
> head(iris)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species ClusterM
1 5.1 3.5 1.4 0.2 setosa 1
2 4.9 3.0 1.4 0.2 setosa 3
3 4.7 3.2 1.3 0.2 setosa 3
4 4.6 3.1 1.5 0.2 setosa 3
```

Source (Stackoverflow)