Nick Knauer - 2 months ago 13x
R Question

# Optimal Clusters Formula: Finding Equivalent Using NbClust

I have two variables that I calculated from Matrix B:

1) The Correlation Matrix

`cor(B)`

2) The Hierarchical Cluster of the Dissimilarity Matrix from the Correlation Matrix

I then used the
`clustConfigurations`
function to calculate the "elbow graph" to determine the optimal amount of clusters.

See Code Below:

``````library(NetCluster)

B = matrix(
c(2, 0, 0, 1, 0, 0, 1,
0, 1, 0, 0, 2, 1, 0,
0, 0, 3, 1, 0, 0, 2,
1, 0, 1, 4, 0, 0, 2,
0, 0, 0, 0, 4, 0, 2,
0, 1, 0, 0, 0, 2, 1,
1, 0, 2, 2, 2, 1, 8),
nrow=7,
ncol=7)
colnames(B) = c("A", "B", "C", "D", "E", "F", "G")
rownames(B) = c("A", "B", "C", "D", "E", "F", "G")
B

A B C D E F G
A 2 0 0 1 0 0 1
B 0 1 0 0 0 1 0
C 0 0 3 1 0 0 2
D 1 0 1 4 0 0 2
E 0 2 0 0 4 0 2
F 0 1 0 0 0 2 1
G 1 0 2 2 2 1 8

Correlation_Matrix <- cor(B)
dissimilarity <- 1 - Correlation_Matrix
Correlation_Matrix_dist <- as.dist(dissimilarity)
Correlation_Matrix_dist
HClust_Correlation_Matrix <- hclust(Correlation_Matrix_dist)
clustered_observed_cors = vector()
num_vertices <- ncol(B)
clustered_observed_cors1 <-clustConfigurations(num_vertices,HClust_Correlation_Matrix,Correlation_Matrix)
``````

When I tried doing this with a larger matrix specifically 1213 x 1213, the matrix was too large to run this script so I decided to use another package called
`NbClust`
.

Documentation:

https://cran.r-project.org/web/packages/NbClust/NbClust.pdf

My goal was to recreate the process above with this new package but I'm not sure whether or not the below code is equivalent to above:

``````library(NbClust)

nbclustering<-NbClust(diss = Correlation_Matrix_dist,
distance = NULL,
min.nc=2,
max.nc=20,
method = "complete",
index = "dunn")

This would give you the optimal amount of clusters:
nbclustering\$Best.nc
``````

Is the above code equivalent to my original code and if not, what change do I need to make?

Thanks!

`NbClust` is a broader function than `hclust` with more focus on the metrics to assess the final number of clusters.

The default method for `hclust` is `"complete"`.

It is the same method used with NbClust with the option: `method = "complete"`.

So it is correct to use the result of Nbclust to define the number of final cluster for the clustering obtained by the function `hclust`.