Nick Knauer Nick Knauer - 4 months ago 24
R Question

Optimal Clusters Formula: Finding Equivalent Using NbClust

I have two variables that I calculated from Matrix B:

1) The Correlation Matrix

cor(B)


2) The Hierarchical Cluster of the Dissimilarity Matrix from the Correlation Matrix

I then used the
clustConfigurations
function to calculate the "elbow graph" to determine the optimal amount of clusters.

See Code Below:

library(NetCluster)

B = matrix(
c(2, 0, 0, 1, 0, 0, 1,
0, 1, 0, 0, 2, 1, 0,
0, 0, 3, 1, 0, 0, 2,
1, 0, 1, 4, 0, 0, 2,
0, 0, 0, 0, 4, 0, 2,
0, 1, 0, 0, 0, 2, 1,
1, 0, 2, 2, 2, 1, 8),
nrow=7,
ncol=7)
colnames(B) = c("A", "B", "C", "D", "E", "F", "G")
rownames(B) = c("A", "B", "C", "D", "E", "F", "G")
B

A B C D E F G
A 2 0 0 1 0 0 1
B 0 1 0 0 0 1 0
C 0 0 3 1 0 0 2
D 1 0 1 4 0 0 2
E 0 2 0 0 4 0 2
F 0 1 0 0 0 2 1
G 1 0 2 2 2 1 8

Correlation_Matrix <- cor(B)
dissimilarity <- 1 - Correlation_Matrix
Correlation_Matrix_dist <- as.dist(dissimilarity)
Correlation_Matrix_dist
HClust_Correlation_Matrix <- hclust(Correlation_Matrix_dist)
clustered_observed_cors = vector()
num_vertices <- ncol(B)
clustered_observed_cors1 <-clustConfigurations(num_vertices,HClust_Correlation_Matrix,Correlation_Matrix)


When I tried doing this with a larger matrix specifically 1213 x 1213, the matrix was too large to run this script so I decided to use another package called
NbClust
.

Documentation:

https://cran.r-project.org/web/packages/NbClust/NbClust.pdf

My goal was to recreate the process above with this new package but I'm not sure whether or not the below code is equivalent to above:

library(NbClust)

nbclustering<-NbClust(diss = Correlation_Matrix_dist,
distance = NULL,
min.nc=2,
max.nc=20,
method = "complete",
index = "dunn")

This would give you the optimal amount of clusters:
nbclustering$Best.nc


Is the above code equivalent to my original code and if not, what change do I need to make?

Thanks!

YCR YCR
Answer

NbClust is a broader function than hclust with more focus on the metrics to assess the final number of clusters.

The default method for hclust is "complete".

It is the same method used with NbClust with the option: method = "complete".

So it is correct to use the result of Nbclust to define the number of final cluster for the clustering obtained by the function hclust.