Kmeans 函数 - Amap 包 - nstart 代表什么
Kmeans function - Amap package - what nstart stands for
我不明白 nstart 算法有什么变化。
如果centers = 8
,这意味着该函数将聚类8个组。但是,nstart 有什么变化?
这是文档上的解释:
centers:
Either the number of clusters or a set of initial cluster centers. If the first, a random set of rows in x are chosen as the initial centers.
nstart:
If centers is a number, how many random sets should be chosen?
详情请往下看:
The algorithm of Hartigan and Wong (1979) is used by default. Note that some authors use k-means to refer to a specific algorithm rather than the general method: most commonly the algorithm given by MacQueen (1967) but sometimes that given by Lloyd (1957) and Forgy (1965). The Hartigan–Wong algorithm generally does a better job than either of those, but trying several random starts (nstart> 1) is often recommended. In rare cases, when some of the points (rows of x) are extremely close, the algorithm may not converge in the “Quick-Transfer” stage, signalling a warning (and returning ifault = 4). Slight rounding of the data may be advisable in that case.
nstart代表随机开始的次数。我无法解释统计细节,但在他们的示例代码中,此函数的作者选择了 25 个随机开始:
## random starts do help here with too many clusters
## (and are often recommended anyway!):
(cl <- kmeans(x, 5, nstart = 25))
不幸的是,?kmeans
并没有准确解释这一点(在 stats
和 amap
包中)。但是,可以通过查看 kmeans
代码来获得一个想法。
如果kmeans
使用多个随机开始(nstart
大于1),则算法returns对应的partition最小总簇内平方和。
(输出包含簇内总平方和值 tot.withinss
)。
我不明白 nstart 算法有什么变化。
如果centers = 8
,这意味着该函数将聚类8个组。但是,nstart 有什么变化?
这是文档上的解释:
centers:
Either the number of clusters or a set of initial cluster centers. If the first, a random set of rows in x are chosen as the initial centers.
nstart:
If centers is a number, how many random sets should be chosen?
详情请往下看:
The algorithm of Hartigan and Wong (1979) is used by default. Note that some authors use k-means to refer to a specific algorithm rather than the general method: most commonly the algorithm given by MacQueen (1967) but sometimes that given by Lloyd (1957) and Forgy (1965). The Hartigan–Wong algorithm generally does a better job than either of those, but trying several random starts (nstart> 1) is often recommended. In rare cases, when some of the points (rows of x) are extremely close, the algorithm may not converge in the “Quick-Transfer” stage, signalling a warning (and returning ifault = 4). Slight rounding of the data may be advisable in that case.
nstart代表随机开始的次数。我无法解释统计细节,但在他们的示例代码中,此函数的作者选择了 25 个随机开始:
## random starts do help here with too many clusters
## (and are often recommended anyway!):
(cl <- kmeans(x, 5, nstart = 25))
不幸的是,?kmeans
并没有准确解释这一点(在 stats
和 amap
包中)。但是,可以通过查看 kmeans
代码来获得一个想法。
如果kmeans
使用多个随机开始(nstart
大于1),则算法returns对应的partition最小总簇内平方和。
(输出包含簇内总平方和值 tot.withinss
)。