patrick patrick - 6 months ago 74
Python Question

Understanding settings of Birch clustering in Scikit Learn

I am testing out the Birch clustering algorithm implemented in Scikit Learn. I am a little confused about a statement in the manual; regarding the parameter

n_clusters
, it states

n_clusters : int, instance of sklearn.cluster model, default None


On the other hand, the initial description of the algorithm is as follows:


class sklearn.cluster.Birch(threshold=0.5, branching_factor=50, n_clusters=3, compute_labels=True, copy=True)


I would take that to mean that
n_clusters
is by default set to 3, not None. This is also what it seems to be doing when I run it.

Am I mis-reading this in some way? What is the logic behind this?

(I guess it does not help that I am not 100 % sure what this setting actually does; I understood it to apply a kind of additional fine-clustering to the outcome of the Birch method. )

Any help is much appreciated!

Answer

Yes, you are right. The default value should be 3 instead of None.

[ https://github.com/scikit-learn/scikit-learn/issues/6635 ]

When n_clusters = integer, the model fit becomes Agglomerative Clustering whose n_clusters is set to the value of that integer.

When n_clusters = None, the further clustering step is not performed and the subclusters are returned as they were before.

Comments