jmlarson - 2 months ago 9
Python Question

Clustering points based on their function values and proximity

I have many points

`X`
and their function values
`f`
stored in
`numpy`
arrays. I want to find all points in
`X`
that don't have a better point (smaller
`f`
value) within a distance
`r`
.

`X`
is hundreds of thousands of points, so I can't precompute
`sp.spatial.distance.pdist(X)`
but resort to the following:

``````def cluster(X,f,r):
pts,n = np.shape(X)
centers = []
for i in range(0,pts):
pdist = sp.spatial.distance.cdist(X,[X[i]])
if not np.any(np.logical_and(pdist <= r, f < f[i])):
centers.append(i)
return centers
``````

This takes minutes. Is there a way to quickly cluster based on proximity and another metric?

You can significantly reduce the number of distance computation by keeping a record. For instance, if j is a neighbor of a center i and it has a larger f value, then j can never be a center since one of its neighbors is i which has a smaller f value. Please check the following and let me know if you need clarification.

``````def cluster4(X,f,r):
pts,n = np.shape(X)
centers = np.ones((pts,1),dtype=int)
for i in range(pts):
if not centers[i]:
continue
pdist = sp.spatial.distance.cdist(X,[X[i]])
inrange = (pdist <= r)
inrange[i] = False
lesser = (f < f[i])
if np.any(inrange & lesser):
centers[i] = 0
centers[inrange & np.invert(lesser)] = 0
return np.where(centers == 1)[0]
``````