I executed scikit-learn k-means algorithm and got the resulting centroids. I have a new document (was not in the initial collection) and I would like to calculate the distance between every centroid and the new document to know in which cluster it should be placed.
Is there a built in function to achieve that or should I write a similarity function manually?
You can use the method
predict to get the closest cluster for each sample in a matrix
from sklearn.cluster import KMeans model = KMeans(n_clusters=K) model.fit(X_train) label = model.predict(X_test)