每次训练的 Kmeans 聚类变化

Kmeans clustering changes for each training

我正在使用 sklearn Kmeans 算法将多个观察值分组为 4 个集群,并且我包含了 init_state 和种子以获得始终相同的结果;但是每次我在 google colab 中重新加载代码,每次我在 运行 训练中,我都会根据每个集群中的观察次数获得不同的结果,这里的代码:

 import numpy as np
 np.random.seed(5)
 from sklearn.cluster import KMeans
 kmeans = KMeans(n_clusters=4,init='k-means++',n_init=1,max_iter=3000,random_state=354)
 kmeans.fit(X)
 y_kmeans = kmeans.predict(X)

我如何才能始终获得相同的结果(根据每个集群中的观察数)?

提前致谢

来自文档

If the algorithm stops before fully converging (because of ``tol`` or
``max_iter``), ``labels_`` and ``cluster_centers_`` will not be consistent,
i.e. the ``cluster_centers_`` will not be the means of the points in each
cluster. Also, the estimator will reassign ``labels_`` after the last
iteration to make ``labels_`` consistent with ``predict`` on the training
set.

要很好地处理 max_iter,请参阅 scikit.cluster 中的 k_meansreturn_n_iter 设置为 True 会得到 best_n_iter,它对应于获得最佳结果的迭代次数。

这是一个例子:

centroids, best_iter = k_means(X, n_clusters=2, init='kmeans++', random_state=0, return_n_iter=True)