Mark Morrisson - 1 year ago 207

Python Question

I'd like to cluster points given to a custom distance and strangely, it seems that neither scipy nor sklearn clustering methods allow the specification of a distance function.

For instance, in

`sklearn.cluster.AgglomerativeClustering`

`sklearn.neighbors.kneighbors_graph`

Answer Source

All of the scipy hierarchical clustering routines will accept a custom distance function that accepts two 1D vectors specifying a pair of points and returns a scalar. For example, using `fclusterdata`

:

```
import numpy as np
from scipy.cluster.hierarchy import fclusterdata
# a custom function that just computes Euclidean distance
def mydist(p1, p2):
diff = p1 - p2
return np.vdot(diff, diff) ** 0.5
X = np.random.randn(100, 2)
fclust1 = fclusterdata(X, 1.0, metric=mydist)
fclust2 = fclusterdata(X, 1.0, metric='euclidean')
print(np.allclose(fclust1, fclust2))
# True
```

Valid inputs for the `metric=`

kwarg are the same as for `scipy.spatial.distance.pdist`

.