如何根据大小过滤 DBSCAN 生成的簇？

Question

我已经应用 DBSCAN 对由点云中每个点的 X、Y 和 Z 坐标组成的数据集执行聚类。我只想绘制少于 100 点的簇。这是我目前所拥有的：

clustering = DBSCAN(eps=0.1, min_samples=20, metric='euclidean').fit(only_xy)
plt.scatter(only_xy[:, 0], only_xy[:, 1],
        c=clustering.labels_, cmap='rainbow')
clusters = clustering.components_
#Store the labels
labels = clustering.labels_

#Then get the frequency count of the non-negative labels
counts = np.bincount(labels[labels>=0])

print(counts)

Output: 
[1278  564  208   47   36   30  191   54   24   18   40  915   26   20
   24  527   56  677   63   57   61 1544  512   21   45  187   39  132
   48   55  160   46   28   18   55   48   35   92   29   88   53   55
   24   52  114   49   34   34   38   52   38   53   69]

所以我找到了每个簇中的点数，但我不确定如何只 select 少于 100 个点的簇。

Answer 1

您可能会找到计数小于 100 的标签的索引：

ls, cs = np.unique(labels,return_counts=True)
dic = dict(zip(ls,cs))
idx = [i for i,label in enumerate(labels) if dic[label] <100 and label >= 0]

然后您可以将结果索引应用于您的 DBSCAN 结果和标签，例如（或多或少）：

plt.scatter(only_xy[idx, 0], only_xy[idx, 1],
        c=clustering.labels_[idx], cmap='rainbow')

Answer 2

我想如果你运行这段代码，你可以获得标签，以及大小超过 100 的集群的集群组件：

from collections import Counter
labels_with_morethan100=[label for (label,count) in Counter(clustering.labels_).items() if count>100]
clusters_biggerthan100= clustering.components_[np.isin(clustering.labels_[clustering.labels_>=0], labels_with_morethan100)]

如何根据大小过滤 DBSCAN 生成的簇？

How to filter clusters produced by DBSCAN based on size?

python

machine-learning

unsupervised-learning

dbscan

scikit-learn