jmlarson jmlarson - 2 months ago 9
Python Question

Clustering points based on their function values and proximity

I have many points

X
and their function values
f
stored in
numpy
arrays. I want to find all points in
X
that don't have a better point (smaller
f
value) within a distance
r
.

X
is hundreds of thousands of points, so I can't precompute
sp.spatial.distance.pdist(X)
but resort to the following:

def cluster(X,f,r):
pts,n = np.shape(X)
centers = []
for i in range(0,pts):
pdist = sp.spatial.distance.cdist(X,[X[i]])
if not np.any(np.logical_and(pdist <= r, f < f[i])):
centers.append(i)
return centers


This takes minutes. Is there a way to quickly cluster based on proximity and another metric?

Answer

You can significantly reduce the number of distance computation by keeping a record. For instance, if j is a neighbor of a center i and it has a larger f value, then j can never be a center since one of its neighbors is i which has a smaller f value. Please check the following and let me know if you need clarification.

def cluster4(X,f,r):
    pts,n = np.shape(X)
    centers = np.ones((pts,1),dtype=int)
    for i in range(pts):
        if not centers[i]:
            continue
        pdist = sp.spatial.distance.cdist(X,[X[i]])
        inrange = (pdist <= r)
        inrange[i] = False
        lesser = (f < f[i])
        if np.any(inrange & lesser):
            centers[i] = 0
        centers[inrange & np.invert(lesser)] = 0
    return np.where(centers == 1)[0]
Comments