Python 的 scikit-learn 中的 DBSCAN：无法理解 DBSCAN 的结果

Question

这个例子来自傻瓜数据科学：

digits = load_digits()
X = digits.data
ground_truth = digits.target

pca = PCA(n_components=40)
Cx = pca.fit_transform(scale(X))

DB = DBSCAN(eps=4.35, min_samples=25, random_state=1)
DB.fit(Cx)



for k,cl in enumerate(np.unique(DB.labels_)):
    if cl >= 0:
        example = np.min(np.where(DB.labels_==cl)) # question 1
        plt.subplot(2, 3, k)
            plt.imshow(digits.images[example],cmap='binary', # question 2
            interpolation='none') 
        plt.title('cl '+str(cl))
plt.show()

我的问题是：

np.where(DB.labels_==cl) 我不明白我们在哪个数组上应用 np.where 当我打印 np.where(DB.labels_==cl) 时，它看起来像是应用于 DB.core_sample_indices_。但我不明白为什么。正如我从 np.where 的文档中了解到的那样，np.where(DB.labels_==cl) 应该应用于 DB.labels_.
为什么 np.min(np.where(DB.labels_==cl)) 给我在 digits.images 中绘制正确图像的索引。谢谢。

Answer 1

运算 DB.labels_ == cl 的输出是布尔数组，如果 DB.labels_[i] == cl，则 (DB.labels_ == cl)[i] 为 True。

因此 np.where 应用于数组 DB.labels_ == cl。如果在单个数组上使用，它的输出是该数组的非零元素，即 True 的元素。

运算 np.where(DB.labels_ == cl) returns 等于 cl 的 DB.labels_ 元素的索引。这些是 fit 中使用的数据元素，已被 DB 标记为集群 cl 的一部分。
在这种情况下 np.min returns 前一个数组中的最小索引。这意味着它将检索集合中第一个被归类为集群 cl 的元素。通过遍历所有集群，您可以检索一组构成集群的图像示例。

此索引对应于 data.image 中的索引，因为 DB.labels_ 包含您提供给 DB.fit 的数据集中每个点的标签。此数据集与 data.images.

Python 的 scikit-learn 中的 DBSCAN：无法理解 DBSCAN 的结果

DBSCAN in scikit-learn of Python: Trouble understanding result of DBSCAN

python

dbscan

scikit-learn