Python、Scikit-learn、K-means：参数 n_init 的实际作用是什么？

Python, Scikit-learn, K-means: What does the parameter n_init actually do?

我是 Python 的初学者。现在，我试图了解 n_init 来自 sklearn.cluster.KMeans

的参数是什么

来自文档：

n_init : int, default: 10

Number of time the k-means algorithm will be run with different centroid seeds. The final results will be the best output of n_init consecutive runs in terms of inertia.

起初，我认为这意味着代码运行的次数，直到我找到这个，然后我意识到这就是 max_iter做。

参数n_init到底有什么作用？实在看不懂

在 K-means 中，质心的初始位置对其收敛起着非常重要的作用。有时，初始质心的放置方式使得在 K-means 集群的连续迭代期间，集群不断发生剧烈变化，甚至在可能出现收敛条件之前，达到 max_iter 并且我们留下不正确的簇。因此，这样获得的聚类可能不正确。为了克服这个问题，引入了这个参数。 n_iter 的值基本上决定了算法应该使用多少组随机选择的质心。对于每组不同的点，比较集群移动了多少距离，即如果集群移动的距离比我们最接近地面 truth/best 解决方案的距离短。返回提供最佳性能的点及其各自的运行以及所有集群标签。

有兴趣的也可以看看k-means++ algorithm专门针对这个问题设计的

您还可以查看 this link for more details 关于初始质心的问题。

Python、Scikit-learn、K-means：参数 n_init 的实际作用是什么？

Python, Scikit-learn, K-means: What does the parameter n_init actually do?

python

cluster-analysis

machine-learning

k-means

scikit-learn