TheBaywatchKid TheBaywatchKid - 4 months ago 62
Python Question

Distance matrix creation using nparray with pdist and squareform

I'm trying to cluster using DBSCAN (scikit learn implementation) and location data. My data is in np array format, but to use DBSCAN with Haversine formula I need to create a distance matrix. I'm getting the following error when I try to do this( a 'module' not callable error.) From what i've reading online this is an import error, but I'm pretty sure thats not the case for me. I've created my own haversine distance formula, but I'm sure the error is not with this.

This is my input data, an np array (ResultArray).

[[ 53.3252628 -6.2644198 ]
[ 53.3287395 -6.2646543 ]
[ 53.33321202 -6.24785807]
[ 53.3261015 -6.2598324 ]
[ 53.325291 -6.2644105 ]
[ 53.3281323 -6.2661467 ]
[ 53.3253074 -6.2644483 ]
[ 53.3388147 -6.2338417 ]
[ 53.3381102 -6.2343826 ]
[ 53.3253074 -6.2644483 ]
[ 53.3228188 -6.2625379 ]
[ 53.3253074 -6.2644483 ]]


And this is the line of code that is erroring.

distance_matrix = sp.spatial.distance.squareform(sp.spatial.distance.pdist
(ResultArray,(lambda u,v: haversine(u,v))))


This is the error message:

File "Location.py", line 48, in <module>
distance_matrix = sp.spatial.distance.squareform(sp.spatial.distance.pdist
(ResArray,(lambda u,v: haversine(u,v))))
File "/usr/lib/python2.7/dist-packages/scipy/spatial/distance.py", line 1118, in pdist
dm[k] = dfun(X[i], X[j])
File "Location.py", line 48, in <lambda>
distance_matrix = sp.spatial.distance.squareform(sp.spatial.distance.pdist
(ResArray,(lambda u,v: haversine(u,v))))
TypeError: 'module' object is not callable


I import scipy as sp. ( import scipy as sp )

Answer

Simply scipy's pdist does not allow to pass in a custom distance function. As you can read in the docs, you have some options, but haverside distance is not within the list of supported metrics.

(Matlab pdist does support the option though, see here)

you need to do the calculation "manually", i.e. with loops, something like this will work:

from numpy import array,zeros

def haversine(lon1, lat1, lon2, lat2):
    """  See the link below for a possible implementation """
    pass

#example input (your's, truncated)
ResultArray = array([[ 53.3252628, -6.2644198 ],
                     [ 53.3287395  , -6.2646543 ],
                     [ 53.33321202 , -6.24785807],
                     [ 53.3253074  , -6.2644483 ]])

N = ResultArray.shape[0]
distance_matrix = zeros((N, N))
for i in xrange(N):
    for j in xrange(N):
        lati, loni = ResultArray[i]
        latj, lonj = ResultArray[j]
        distance_matrix[i, j] = haversine(loni, lati, lonj, latj)
        distance_matrix[j, i] = distance_matrix[i, j]

print distance_matrix
[[ 0.          0.38666203  1.41010971  0.00530489]
 [ 0.38666203  0.          1.22043364  0.38163748]
 [ 1.41010971  1.22043364  0.          1.40848782]
 [ 0.00530489  0.38163748  1.40848782  0.        ]]

Just for reference, an implementation in Python of Haverside can be found here.