DBSCAN returns 个部分聚类
DBSCAN returns partial clusters
我正在尝试在此处实现 DBSCAN 的代码:http://en.wikipedia.org/wiki/DBSCAN
我感到困惑的部分是
expandCluster(P, NeighborPts, C, eps, MinPts)
add P to cluster C
for each point P' in NeighborPts
if P' is not visited
mark P' as visited
NeighborPts' = regionQuery(P', eps)
if sizeof(NeighborPts') >= MinPts
NeighborPts = NeighborPts joined with NeighborPts'
if P' is not yet member of any cluster
add P' to cluster C
我的代码如下。照原样,它目前 returns 部分聚类,其中一个点应该是密度连接的,即使它不在直接的 eps 邻域中。我的代码只有 returns 每个点的前几个邻居。
import numpy
import time
from math import radians, cos, sin, asin, sqrt
import re, math
def haversine(lon1, lat1, lon2, lat2):
"""
Calculate the great circle distance between two points
on the earth (specified in decimal degrees) returned as kilometers
"""
# convert decimal degrees to radians
lon1, lat1, lon2, lat2 = map(radians, [lon1, lat1, lon2, lat2])
# haversine formula
dlon = lon2 - lon1
dlat = lat2 - lat1
a = sin(dlat/2)**2 + cos(lat1) * cos(lat2) * sin(dlon/2)**2
c = 2 * asin(sqrt(a))
km = 6367 * c
return km
def ST_DBSCAN(points,max_distance,MinPts):
global visited
visited = []
noise = []
cluster_id = 0
clusters = []
in_cluster = []
for p in points:
if p not in visited:
# neighbor_points = []
visited.append(p)
NeighborPts = regionQuery(p,points,max_distance)
if len(NeighborPts) < MinPts:
noise.append(p)
else:
cluster_id = cluster_id + 1
g = expandCluster(p,NeighborPts,max_distance,MinPts,in_cluster)
clusters.append(g)
return clusters
#return len(NeighborPts)
def expandCluster(p,NeighborPts,max_distance,MinPts,in_cluster):
in_cluster.append(p[0])
cluster = []
cluster.append(p[0])
for point in NeighborPts:
if point not in visited:
visited.append(point)
new_neighbors = regionQuery(point,points,max_distance)
if len(new_neighbors) >= MinPts:
new_neighbors.append(NeighborPts)
if point[0] not in in_cluster:
in_cluster.append(point[0])
cluster.append(point[0])
return cluster
def regionQuery(p,points,max_distance):
neighbor_points = []
for j in points:
if j != p:
# print 'P is %s and j is %s' % (p[0],j[0])
dist = haversine(p[1],p[2],j[1],j[2])
if dist <= max_distance:
neighbor_points.append(j)
neighbor_points.append(p)
return neighbor_points
我在下面有一个子集。点 1 和点 5 应该相距 10.76 公里,因此它们不应该在初始查询中,但它们应该包含在同一个集群中,因为点 5 是通过点 3 连接的密度。
pointList = [[1,36.4686,2.8289],
[2,36.4706,2.8589],
[3,36.4726,2.8889],
[4,36.4746,2.9189],
[5,36.4766,2.9489],
[6,36.4786,2.9789],
[7,36.4806,3.0089],
[8,36.4826,3.0389],
[9,36.4846,3.0689],
[10,36.4866,3.0989]]
points= pointList
g = ST_DBSCAN(points,10,3)
您的 expandCluster
函数忘记了新邻居。
您的设置更新已交换。
我正在尝试在此处实现 DBSCAN 的代码:http://en.wikipedia.org/wiki/DBSCAN
我感到困惑的部分是
expandCluster(P, NeighborPts, C, eps, MinPts)
add P to cluster C
for each point P' in NeighborPts
if P' is not visited
mark P' as visited
NeighborPts' = regionQuery(P', eps)
if sizeof(NeighborPts') >= MinPts
NeighborPts = NeighborPts joined with NeighborPts'
if P' is not yet member of any cluster
add P' to cluster C
我的代码如下。照原样,它目前 returns 部分聚类,其中一个点应该是密度连接的,即使它不在直接的 eps 邻域中。我的代码只有 returns 每个点的前几个邻居。
import numpy
import time
from math import radians, cos, sin, asin, sqrt
import re, math
def haversine(lon1, lat1, lon2, lat2):
"""
Calculate the great circle distance between two points
on the earth (specified in decimal degrees) returned as kilometers
"""
# convert decimal degrees to radians
lon1, lat1, lon2, lat2 = map(radians, [lon1, lat1, lon2, lat2])
# haversine formula
dlon = lon2 - lon1
dlat = lat2 - lat1
a = sin(dlat/2)**2 + cos(lat1) * cos(lat2) * sin(dlon/2)**2
c = 2 * asin(sqrt(a))
km = 6367 * c
return km
def ST_DBSCAN(points,max_distance,MinPts):
global visited
visited = []
noise = []
cluster_id = 0
clusters = []
in_cluster = []
for p in points:
if p not in visited:
# neighbor_points = []
visited.append(p)
NeighborPts = regionQuery(p,points,max_distance)
if len(NeighborPts) < MinPts:
noise.append(p)
else:
cluster_id = cluster_id + 1
g = expandCluster(p,NeighborPts,max_distance,MinPts,in_cluster)
clusters.append(g)
return clusters
#return len(NeighborPts)
def expandCluster(p,NeighborPts,max_distance,MinPts,in_cluster):
in_cluster.append(p[0])
cluster = []
cluster.append(p[0])
for point in NeighborPts:
if point not in visited:
visited.append(point)
new_neighbors = regionQuery(point,points,max_distance)
if len(new_neighbors) >= MinPts:
new_neighbors.append(NeighborPts)
if point[0] not in in_cluster:
in_cluster.append(point[0])
cluster.append(point[0])
return cluster
def regionQuery(p,points,max_distance):
neighbor_points = []
for j in points:
if j != p:
# print 'P is %s and j is %s' % (p[0],j[0])
dist = haversine(p[1],p[2],j[1],j[2])
if dist <= max_distance:
neighbor_points.append(j)
neighbor_points.append(p)
return neighbor_points
我在下面有一个子集。点 1 和点 5 应该相距 10.76 公里,因此它们不应该在初始查询中,但它们应该包含在同一个集群中,因为点 5 是通过点 3 连接的密度。
pointList = [[1,36.4686,2.8289],
[2,36.4706,2.8589],
[3,36.4726,2.8889],
[4,36.4746,2.9189],
[5,36.4766,2.9489],
[6,36.4786,2.9789],
[7,36.4806,3.0089],
[8,36.4826,3.0389],
[9,36.4846,3.0689],
[10,36.4866,3.0989]]
points= pointList
g = ST_DBSCAN(points,10,3)
您的 expandCluster
函数忘记了新邻居。
您的设置更新已交换。