patrick - 11 months ago 154

Python Question

I am testing out the Birch clustering algorithm implemented in Scikit Learn. I am a little confused about a statement in the manual; regarding the parameter

`n_clusters`

`n_clusters : int, instance of sklearn.cluster model, default None`

On the other hand, the initial description of the algorithm is as follows:

class sklearn.cluster.Birch(threshold=0.5, branching_factor=50,n_clusters=3, compute_labels=True, copy=True)

I would take that to mean that

`n_clusters`

Am I mis-reading this in some way? What is the logic behind this?

(I guess it does not help that I am not 100 % sure what this setting actually does; I understood it to apply a kind of additional fine-clustering to the outcome of the Birch method. )

Any help is much appreciated!

Answer

Yes, you are right. The default value should be 3 instead of None.

[ https://github.com/scikit-learn/scikit-learn/issues/6635 ]

When `n_clusters = integer`

, the model fit becomes Agglomerative Clustering whose `n_clusters`

is set to the value of that `integer`

.

When `n_clusters = None`

, the further clustering step is not performed and the subclusters are returned as they were before.