Phil - 1 year ago 105

Python Question

I am using K-means for a clustering problem. I am trying to find the data point which is most close to the centroid, which I believe is called the medoid.

Is there a way to do this in scikit-learn?

Answer

This is not the medoid, but here's something you can try:

```
>>> import numpy as np
>>> from sklearn.cluster import KMeans
>>> from sklearn.metrics import pairwise_distances_argmin_min
>>> X = np.random.randn(10, 4)
>>> km = KMeans(n_clusters=2).fit(X)
>>> closest, _ = pairwise_distances_argmin_min(km.cluster_centers_, X)
>>> closest
array([0, 8])
```

The array `closest`

contains the index of the point in `X`

that is closest to each centroid. So `X[0]`

is the closest point in `X`

to centroid 0, and `X[8]`

is the closest to centroid 1.