I am using the elbow method, silhouette and trying to find the optimal number of k m clusters from the data. Now with most packages it gives 3 with PAM, Kmeans, clara if I consider wss (within similarity scores) or silhouette. With Hubert analysis I am getting ideally 2 clusters. Only strange things is the below command gives me a plot which to me is a bit confusing. Should I consider it as 3 clusters or 4. If anyone can give me some feedbacks here.
wss <- (nrow(scale(df))-1)*sum(apply(scale(df),2,var))
for (i in 2:10) wss[i] <- sum(kmeans(scale(df),
fviz_nbclust(scale(df), kmeans, method = "wss")
The basic idea is that low "Within Sum of Squared" is a signal of a good model (in terms of error). However, the more clusters, the lower that value of this sum of squared errors (SSE).
In simple terms: "when you see that the rate at which the SSE is decreasing (with a higher number of clusters) is slowing down, that would a good point to freeze the number of clusters".
Hence, it is the elbow, in your case at number 4, because the SSE decline is slowing down after 4.
On wikipedia there is an excellent overview of how the number of clusters may be determined: here