mech - 8 months ago 75

Python Question

I am trying to implement the code for DBSCAN here: http://en.wikipedia.org/wiki/DBSCAN

The portion I am confused about is

`expandCluster(P, NeighborPts, C, eps, MinPts)`

add P to cluster C

for each point P' in NeighborPts

if P' is not visited

mark P' as visited

NeighborPts' = regionQuery(P', eps)

if sizeof(NeighborPts') >= MinPts

NeighborPts = NeighborPts joined with NeighborPts'

if P' is not yet member of any cluster

add P' to cluster C

My code is below. As is, it currently returns partial clusters where a point should be density connected even if it is not in the immediate eps neighborhood. My code only returns the first few neighbors for each point.

`import numpy`

import time

from math import radians, cos, sin, asin, sqrt

import re, math

def haversine(lon1, lat1, lon2, lat2):

"""

Calculate the great circle distance between two points

on the earth (specified in decimal degrees) returned as kilometers

"""

# convert decimal degrees to radians

lon1, lat1, lon2, lat2 = map(radians, [lon1, lat1, lon2, lat2])

# haversine formula

dlon = lon2 - lon1

dlat = lat2 - lat1

a = sin(dlat/2)**2 + cos(lat1) * cos(lat2) * sin(dlon/2)**2

c = 2 * asin(sqrt(a))

km = 6367 * c

return km

def ST_DBSCAN(points,max_distance,MinPts):

global visited

visited = []

noise = []

cluster_id = 0

clusters = []

in_cluster = []

for p in points:

if p not in visited:

# neighbor_points = []

visited.append(p)

NeighborPts = regionQuery(p,points,max_distance)

if len(NeighborPts) < MinPts:

noise.append(p)

else:

cluster_id = cluster_id + 1

g = expandCluster(p,NeighborPts,max_distance,MinPts,in_cluster)

clusters.append(g)

return clusters

#return len(NeighborPts)

def expandCluster(p,NeighborPts,max_distance,MinPts,in_cluster):

in_cluster.append(p[0])

cluster = []

cluster.append(p[0])

for point in NeighborPts:

if point not in visited:

visited.append(point)

new_neighbors = regionQuery(point,points,max_distance)

if len(new_neighbors) >= MinPts:

new_neighbors.append(NeighborPts)

if point[0] not in in_cluster:

in_cluster.append(point[0])

cluster.append(point[0])

return cluster

def regionQuery(p,points,max_distance):

neighbor_points = []

for j in points:

if j != p:

# print 'P is %s and j is %s' % (p[0],j[0])

dist = haversine(p[1],p[2],j[1],j[2])

if dist <= max_distance:

neighbor_points.append(j)

neighbor_points.append(p)

return neighbor_points

I have a subset below. Points 1 and 5 should be 10.76 km apart so they shouldn't be in the initial query but they should be included in the same cluster because point 5 is density connected through point 3.

`pointList = [[1,36.4686,2.8289],`

[2,36.4706,2.8589],

[3,36.4726,2.8889],

[4,36.4746,2.9189],

[5,36.4766,2.9489],

[6,36.4786,2.9789],

[7,36.4806,3.0089],

[8,36.4826,3.0389],

[9,36.4846,3.0689],

[10,36.4866,3.0989]]

points= pointList

g = ST_DBSCAN(points,10,3)

Answer

Your `expandCluster`

function forgets the new neighbors.

Your set update is swapped.