Candic3 Candic3 -4 years ago 159
Python Question

How to implement callable distance metric in scikit-learn?

I'm using the clustering module in python's scikit learn, and I'd like to use a Normalized Euclidean Distance. There is no built-in distance for this (that i know of) Here's a list.

So, I want to implement my own Normalized Euclidean Distance using a callable. The function is part of my

distance
module and is called
distance.normalized_euclidean_distance
. It takes three inputs:
X
,
Y
, and
SD
.

However, Normalized Euclidean Distance requires standard deviation for the population sample. But, the pairwise distance in scipy only allows two inputs:
X
and
Y
.

How do I allow it to take an additional argument?

I tried putting it in as a
**kwarg
, but that didn't seem to work:

cluster = DBSCAN(eps=1.0, min_samples=1,metric = distance.normalized_euclidean, SD = stdv)


where
distance.normalized_euclidean
is the function that I wrote that takes in two arrays,
X
and
Y
and computes the normalized euclidean distance between them.

...but this throws an error:

TypeError: __init__() got an unexpected keyword argument 'SD'


What is the way to use additional keyword arguments?

Here it says
Any further parameters are passed directly to the distance function.
, which made me think that this would be acceptable.

Answer Source

You can use a lambda function as metric which takes two input arrays:

cluster = DBSCAN(eps=1.0, min_samples=1,metric=lambda X, Y: distance.normalized_euclidean(X, Y, SD=stdv))
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download