dbscan
Density-based spatial clustering of applications with noise (DBSCAN), 基于密度的聚类方法。
算法的阈值只有两个,距离阈值eps, 最小点个数minPts。如果一个点在eps为半径的园内,包含点的个数>=minPts,那么这个点以及这个圆内的点可以作为一个cluster。
由于需要获取某个点附近k个点,所以会用kdtree做查询优化。
DBSCAN(DB, distFunc, eps, minPts) {
C := 0 /* Cluster counter */
for each point P in database DB {
if label(P) ≠ undefined then continue /* Previously processed in inner loop */
Neighbors N := RangeQuery(DB, distFunc, P, eps) /* Find neighbors */
if |N| < minPts then { /* Density check */
label(P) := Noise /* Label as Noise */
continue
}
C := C + 1 /* next cluster label */
label(P) := C /* Label initial point */
SeedSet S := N \ {P} /* Neighbors to expand */
for each point Q in S { /* Process every seed point Q */
if label(Q) = Noise then label(Q) := C /* Change Noise to border point */
if label(Q) ≠ undefined then continue /* Previously processed (e.g., border point) */
label(Q) := C /* Label neighbor */
Neighbors N := RangeQuery(DB, distFunc, Q, eps) /* Find neighbors */
if |N| ≥ minPts then { /* Density check (if Q is a core point) */
S := S ∪ N /* Add new neighbors to seed set */
}
}
}
}
RangeQuery(DB, distFunc, Q, eps) {
Neighbors N := empty list
for each point P in database DB { /* Scan all points in the database */
if distFunc(Q, P) ≤ eps then { /* Compute distance and check epsilon */
N := N ∪ {P} /* Add to result */
}
}
return N
}
- dbscan python实现
https://github.com/lyyiangang/python_test/blob/master/dbscan/dbscan.py
-
kdtree python 实现
https://github.com/lyyiangang/python_test/blob/master/dbscan/kdtree.py -
cpp 实现
https://github.com/Eleobert/dbscan
https://scikit-learn.org/stable/modules/generated/sklearn.cluster.DBSCAN.html
https://scikit-learn.org/stable/auto_examples/cluster/plot_dbscan.html#sphx-glr-auto-examples-cluster-plot-dbscan-py