I am using K-means for a clustering problem. I am trying to find the data point which is most close to the centroid, which I believe is called the medoid.
Is there a way to do this in scikit-learn?
This is not the medoid, but here's something you can try:
>>> import numpy as np >>> from sklearn.cluster import KMeans >>> from sklearn.metrics import pairwise_distances_argmin_min >>> X = np.random.randn(10, 4) >>> km = KMeans(n_clusters=2).fit(X) >>> closest, _ = pairwise_distances_argmin_min(km.cluster_centers_, X) >>> closest array([0, 8])
closest contains the index of the point in
X that is closest to each centroid. So
X is the closest point in
X to centroid 0, and
X is the closest to centroid 1.