mech mech - 9 months ago 84
Python Question

DBSCAN returns partial clusters

I am trying to implement the code for DBSCAN here:

The portion I am confused about is

expandCluster(P, NeighborPts, C, eps, MinPts)
add P to cluster C
for each point P' in NeighborPts
if P' is not visited
mark P' as visited
NeighborPts' = regionQuery(P', eps)
if sizeof(NeighborPts') >= MinPts
NeighborPts = NeighborPts joined with NeighborPts'
if P' is not yet member of any cluster
add P' to cluster C

My code is below. As is, it currently returns partial clusters where a point should be density connected even if it is not in the immediate eps neighborhood. My code only returns the first few neighbors for each point.

import numpy
import time
from math import radians, cos, sin, asin, sqrt
import re, math

def haversine(lon1, lat1, lon2, lat2):
Calculate the great circle distance between two points
on the earth (specified in decimal degrees) returned as kilometers
# convert decimal degrees to radians
lon1, lat1, lon2, lat2 = map(radians, [lon1, lat1, lon2, lat2])
# haversine formula
dlon = lon2 - lon1
dlat = lat2 - lat1
a = sin(dlat/2)**2 + cos(lat1) * cos(lat2) * sin(dlon/2)**2
c = 2 * asin(sqrt(a))
km = 6367 * c
return km

def ST_DBSCAN(points,max_distance,MinPts):
global visited
visited = []
noise = []
cluster_id = 0
clusters = []
in_cluster = []
for p in points:
if p not in visited:
# neighbor_points = []
NeighborPts = regionQuery(p,points,max_distance)
if len(NeighborPts) < MinPts:
cluster_id = cluster_id + 1
g = expandCluster(p,NeighborPts,max_distance,MinPts,in_cluster)
return clusters

#return len(NeighborPts)

def expandCluster(p,NeighborPts,max_distance,MinPts,in_cluster):
cluster = []
for point in NeighborPts:
if point not in visited:
new_neighbors = regionQuery(point,points,max_distance)
if len(new_neighbors) >= MinPts:
if point[0] not in in_cluster:
return cluster

def regionQuery(p,points,max_distance):
neighbor_points = []
for j in points:
if j != p:
# print 'P is %s and j is %s' % (p[0],j[0])
dist = haversine(p[1],p[2],j[1],j[2])
if dist <= max_distance:
return neighbor_points

I have a subset below. Points 1 and 5 should be 10.76 km apart so they shouldn't be in the initial query but they should be included in the same cluster because point 5 is density connected through point 3.

pointList = [[1,36.4686,2.8289],

points= pointList

g = ST_DBSCAN(points,10,3)


Your expandCluster function forgets the new neighbors.

Your set update is swapped.