TheBaywatchKid - 8 months ago 108

Python Question

I'm trying to cluster using DBSCAN (scikit learn implementation) and location data. My data is in np array format, but to use DBSCAN with Haversine formula I need to create a distance matrix. I'm getting the following error when I try to do this( a 'module' not callable error.) From what i've reading online this is an import error, but I'm pretty sure thats not the case for me. I've created my own haversine distance formula, but I'm sure the error is not with this.

**This is my input data, an np array (ResultArray).**

`[[ 53.3252628 -6.2644198 ]`

[ 53.3287395 -6.2646543 ]

[ 53.33321202 -6.24785807]

[ 53.3261015 -6.2598324 ]

[ 53.325291 -6.2644105 ]

[ 53.3281323 -6.2661467 ]

[ 53.3253074 -6.2644483 ]

[ 53.3388147 -6.2338417 ]

[ 53.3381102 -6.2343826 ]

[ 53.3253074 -6.2644483 ]

[ 53.3228188 -6.2625379 ]

[ 53.3253074 -6.2644483 ]]

`distance_matrix = sp.spatial.distance.squareform(sp.spatial.distance.pdist`

(ResultArray,(lambda u,v: haversine(u,v))))

`File "Location.py", line 48, in <module>`

distance_matrix = sp.spatial.distance.squareform(sp.spatial.distance.pdist

(ResArray,(lambda u,v: haversine(u,v))))

File "/usr/lib/python2.7/dist-packages/scipy/spatial/distance.py", line 1118, in pdist

dm[k] = dfun(X[i], X[j])

File "Location.py", line 48, in <lambda>

distance_matrix = sp.spatial.distance.squareform(sp.spatial.distance.pdist

(ResArray,(lambda u,v: haversine(u,v))))

TypeError: 'module' object is not callable

I import scipy as sp. ( import scipy as sp )

Answer

Simply `scipy`

's `pdist`

does not allow to pass in a custom distance function. As you can read in the docs, you have some options, but haverside distance is not within the list of supported metrics.

(Matlab `pdist`

does support the option though, see here)

you need to do the calculation "manually", i.e. with loops, something like this will work:

```
from numpy import array,zeros
def haversine(lon1, lat1, lon2, lat2):
""" See the link below for a possible implementation """
pass
#example input (your's, truncated)
ResultArray = array([[ 53.3252628, -6.2644198 ],
[ 53.3287395 , -6.2646543 ],
[ 53.33321202 , -6.24785807],
[ 53.3253074 , -6.2644483 ]])
N = ResultArray.shape[0]
distance_matrix = zeros((N, N))
for i in xrange(N):
for j in xrange(N):
lati, loni = ResultArray[i]
latj, lonj = ResultArray[j]
distance_matrix[i, j] = haversine(loni, lati, lonj, latj)
distance_matrix[j, i] = distance_matrix[i, j]
print distance_matrix
[[ 0. 0.38666203 1.41010971 0.00530489]
[ 0.38666203 0. 1.22043364 0.38163748]
[ 1.41010971 1.22043364 0. 1.40848782]
[ 0.00530489 0.38163748 1.40848782 0. ]]
```

Just for reference, an implementation in Python of Haverside can be found here.