了解 KMeans 算法的质量
Understanding the quality of the KMeans algorithm
阅读 后,我试图了解其工作原理。我的意思是,从我的例子中,我可以看出因子的值越小,KMeans 聚类的质量就越好,即它的聚类越平衡。但是这个因素的赤裸裸的数学解释是什么?这是已知数量还是什么?
这是我的例子:
C1 = 10
C2 = 100
pdd = [(C1,10), (C2, 100)]
n = 2 <-- #clusters
total = 110 <-- #points
uf = 10 * 10 + 100 * 100
uf = 100100 * 2 / 12100 = 16.5
C1 = 50
C2 = 60
pdd = [(C1, 50), (C2, 60)]
n = 2
total = 110
uf = 2500 + 3600
uf = 6100 * 2 / 12100 = 1.008
C1 = 1
C2 = 1
pdd = [(C1, 1), (C2, 1)]
n = 2
total = 2
uf = 2
uf = 2 * 2 / 2 * 2 = 1
It appears to be related to Gini index, a measure of entropy, which also uses the sum of squared counts.
如Cross Validated: Understanding the quality of the KMeans algorithm.
所述
阅读
这是我的例子:
C1 = 10
C2 = 100
pdd = [(C1,10), (C2, 100)]
n = 2 <-- #clusters
total = 110 <-- #points
uf = 10 * 10 + 100 * 100
uf = 100100 * 2 / 12100 = 16.5
C1 = 50
C2 = 60
pdd = [(C1, 50), (C2, 60)]
n = 2
total = 110
uf = 2500 + 3600
uf = 6100 * 2 / 12100 = 1.008
C1 = 1
C2 = 1
pdd = [(C1, 1), (C2, 1)]
n = 2
total = 2
uf = 2
uf = 2 * 2 / 2 * 2 = 1
It appears to be related to Gini index, a measure of entropy, which also uses the sum of squared counts.
如Cross Validated: Understanding the quality of the KMeans algorithm.
所述