Fraz Fraz - 3 months ago 19
Python Question

How to find mean in kmeans in single shot using numpy

I have a function:

def update(points, closest, centroids):
return np.array([points[closest==k].mean(axis=0) for k in range(centroids.shape[0])])

It basically the update of centroids step in kmeans algorithm.
Basically, points is a matrix, closest is an assignment of a point to a cluster..

and then all i am doing is finding the new mean based on points in a cluster..

but I was wondering if i can get rid of that for loop?
which is if i can find the cluster mean in one shot?


Here's a vectorized approach based on np.add.reduceat -

c = np.bincount(closest,minlength=centroids.shape[0])
mask = c != 0
pts_grp = points[closest.argsort()]
cut_idx = np.append(0,c[mask].cumsum()[:-1])
out = np.full((centroids.shape[0],points.shape[1]),np.nan)
out[mask] = np.add.reduceat(pts_grp,cut_idx,axis=0)/c[mask,None].astype(float)