Kmeans 函数 - Amap 包 - nstart 代表什么

Question

我不明白 nstart 算法有什么变化。

如果centers = 8，这意味着该函数将聚类8个组。但是，nstart 有什么变化？

这是文档上的解释：

centers:    
Either the number of clusters or a set of initial cluster centers. If the first, a random set of rows in x are chosen as the initial centers.

nstart:
If centers is a number, how many random sets should be chosen?

Answer 1

详情请往下看：

The algorithm of Hartigan and Wong (1979) is used by default. Note that some authors use k-means to refer to a specific algorithm rather than the general method: most commonly the algorithm given by MacQueen (1967) but sometimes that given by Lloyd (1957) and Forgy (1965). The Hartigan–Wong algorithm generally does a better job than either of those, but trying several random starts (nstart> 1) is often recommended. In rare cases, when some of the points (rows of x) are extremely close, the algorithm may not converge in the “Quick-Transfer” stage, signalling a warning (and returning ifault = 4). Slight rounding of the data may be advisable in that case.

nstart代表随机开始的次数。我无法解释统计细节，但在他们的示例代码中，此函数的作者选择了 25 个随机开始：

## random starts do help here with too many clusters
## (and are often recommended anyway!):
(cl <- kmeans(x, 5, nstart = 25))

Answer 2

不幸的是，?kmeans 并没有准确解释这一点（在 stats 和 amap 包中）。但是，可以通过查看 kmeans 代码来获得一个想法。

如果kmeans使用多个随机开始（nstart大于1），则算法returns对应的partition最小总簇内平方和。

（输出包含簇内总平方和值 tot.withinss）。

Kmeans 函数 - Amap 包 - nstart 代表什么

Kmeans function - Amap package - what nstart stands for

r

data-mining

k-means