每次训练的 Kmeans 聚类变化

Question

我正在使用 sklearn Kmeans 算法将多个观察值分组为 4 个集群，并且我包含了 init_state 和种子以获得始终相同的结果；但是每次我在 google colab 中重新加载代码，每次我在运行训练中，我都会根据每个集群中的观察次数获得不同的结果，这里的代码：

 import numpy as np
 np.random.seed(5)
 from sklearn.cluster import KMeans
 kmeans = KMeans(n_clusters=4,init='k-means++',n_init=1,max_iter=3000,random_state=354)
 kmeans.fit(X)
 y_kmeans = kmeans.predict(X)

我如何才能始终获得相同的结果（根据每个集群中的观察数）？

提前致谢

Answer 1

来自文档

If the algorithm stops before fully converging (because of ``tol`` or
``max_iter``), ``labels_`` and ``cluster_centers_`` will not be consistent,
i.e. the ``cluster_centers_`` will not be the means of the points in each
cluster. Also, the estimator will reassign ``labels_`` after the last
iteration to make ``labels_`` consistent with ``predict`` on the training
set.

要很好地处理 max_iter，请参阅 scikit.cluster 中的 k_means 将 return_n_iter 设置为 True 会得到 best_n_iter，它对应于获得最佳结果的迭代次数。

这是一个例子：

centroids, best_iter = k_means(X, n_clusters=2, init='kmeans++', random_state=0, return_n_iter=True)

每次训练的 Kmeans 聚类变化

Kmeans clustering changes for each training

python

k-means

scikit-learn