K 均值聚类:根据聚类中数据点的最大大小确定最佳质心数

K-means Clustering: Determine optimal number of centroids based on maximum size of data point in a cluster

我需要找到一组 96 条线的最佳位置和质心数量。这些线由(x,y,z)中两组起点和终点坐标之间的距离表示。

每个质心最多不能超过 10 行。质心坐标的 z 分量也需要设置为零。我的代码如下所示 returns 质心位置并为每个质心分配标签(即线),但有些质心需要超过 10 个,这不是我们想要的。

下面是代码和我尝试的代码的解决方案:

为了解决这个问题,我使用了 python 2.7 中的 Scikit 包提供的 k-mean 聚类。

线段x,y,z的起点坐标如下所示:

Start=[[  3.08534969e+02   3.25666185e+03  -2.20581200e+00]
 [  3.09533447e+02   3.25965789e+03  -2.17015040e+00]
 [  3.08584658e+02   3.25954029e+03  -2.19941120e+00]
 [  3.07585711e+02   3.25667680e+03  -2.12869760e+00]
 [  3.11562101e+02   3.25982736e+03  -1.94063600e+00]
 [  3.11859006e+02   3.25702733e+03  -7.63193600e-01]
 [  3.08145280e+02   3.25572271e+03  -2.19514400e+00]
 [  3.08377680e+02   3.25856225e+03  -2.16009200e+00]
 [  3.10441338e+02   3.25829875e+03  -2.63984720e+00]
 [  3.10367962e+02   3.25389346e+03  -2.11223840e+00]
 [  3.10137910e+02   3.25741340e+03  -2.46794000e+00]
 [  3.09537281e+02   3.25892338e+03  -2.28932720e+00]
 [  3.09344406e+02   3.25543844e+03  -2.40515120e+00]
 [  3.10450645e+02   3.25749069e+03  -2.33870480e+00]
 [  3.10430410e+02   3.25851708e+03  -2.33870480e+00]
 [  3.07783327e+02   3.25704679e+03  -1.12438160e+00]
 [  3.12930506e+02   3.25741935e+03  -1.11980960e+00]
 [  3.12705650e+02   3.25455548e+03  -1.10548400e+00]
 [  3.11434863e+02   3.26262004e+03  -2.07596720e+00]
 [  3.09685407e+02   3.26189987e+03  -2.38137680e+00]
 [  3.08528861e+02   3.26162489e+03  -2.13967040e+00]
 [  3.11014726e+02   3.26106711e+03  -2.12564960e+00]
 [  3.09174753e+02   3.26233603e+03  -2.18020880e+00]
 [  3.12973636e+02   3.26208751e+03  -2.17898960e+00]
 [  3.13502547e+02   3.26047741e+03  -2.09882720e+00]
 [  3.13546812e+02   3.26050827e+03  -2.13052640e+00]
 [  3.09981100e+02   3.26250428e+03  -2.15552000e+00]
 [  3.10161877e+02   3.26164522e+03  -2.20215440e+00]
 [  3.13995539e+02   3.26372098e+03  -2.16801680e+00]
 [  3.11358342e+02   3.26370646e+03  -1.36791680e+00]
 [  3.10769621e+02   3.25945517e+03  -2.30974880e+00]
 [  3.09862741e+02   3.26403583e+03  -2.28201200e+00]
 [  3.08458161e+02   3.25817564e+03  -2.15552000e+00]
 [  3.09435541e+02   3.25825584e+03  -2.28719360e+00]
 [  3.08634345e+02   3.25732574e+03  -1.27922000e+00]
 [  3.09930506e+02   3.25441935e+03  -1.11980960e+00]
 [  3.09705650e+02   3.25755548e+03  -1.10548400e+00]
 [  3.15314631e+02   3.27539678e+03  -2.22013760e+00]
 [  3.14496076e+02   3.27450937e+03  -1.77543440e+00]
 [  3.19773285e+02   3.27483172e+03  -2.36888000e+00]
 [  3.20160936e+02   3.27174365e+03  -1.46758640e+00]
 [  3.17632108e+02   3.27333074e+03  -1.54531040e+00]
 [  3.20633806e+02   3.27121131e+03  -1.73763920e+00]
 [  3.15134081e+02   3.26918375e+03  -1.91716640e+00]
 [  3.14833043e+02   3.27344651e+03  -2.14759520e+00]
 [  3.15386873e+02   3.27457986e+03  -2.80382960e+00]
 [  3.15389504e+02   3.27056319e+03  -3.79961120e+00]
 [  3.09928864e+02   3.25862972e+03  -2.28262160e+00]
 [  3.12981100e+02   3.25850428e+03  -2.15552000e+00]
 [  3.13161877e+02   3.26164522e+03  -2.20215440e+00]
 [  3.09995539e+02   3.26272098e+03  -2.16801680e+00]
 [  3.08358342e+02   3.26470646e+03  -1.36791680e+00]
 [  3.10769621e+02   3.26145517e+03  -2.30974880e+00]
 [  3.09862741e+02   3.26403583e+03  -2.28201200e+00]
 [  3.09458161e+02   3.25917564e+03  -2.15552000e+00]
 [  3.09435541e+02   3.25825584e+03  -2.28719360e+00]
 [  3.06297341e+02   3.25738852e+03  -1.09207280e+00]
 [  3.08746854e+02   3.25800712e+03  -2.22013760e+00]
 [  3.07296526e+02   3.25420303e+03  -2.36095520e+00]
 [  3.09912102e+02   3.25860281e+03  -1.15181360e+00]
 [  3.07497596e+02   3.25625977e+03  -1.15181360e+00]
 [  3.07817203e+02   3.25477321e+03  -2.68404320e+00]
 [  3.05817919e+02   3.25780380e+03  -2.70659840e+00]
 [  3.06587855e+02   3.25373025e+03  -7.75385600e-01]
 [  3.05538448e+02   3.25854594e+03  -2.20611680e+00]
 [  3.06360464e+02   3.25706283e+03  -7.85748800e-01]
 [  3.10994093e+02   3.25396928e+03  -1.83639440e+00]
 [  3.06823412e+02   3.25784599e+03  -1.82328800e+00]
 [  3.04608725e+02   3.25721917e+03  -1.70502560e+00]
 [  3.09470284e+02   3.25645552e+03  -2.23994960e+00]
 [  3.19844699e+02   3.27955840e+03  -3.10040000e+00]
 [  3.19818319e+02   3.27689903e+03  -2.35577360e+00]
 [  3.19928996e+02   3.27901237e+03  -1.79250320e+00]
 [  3.16798820e+02   3.28194784e+03  -2.20855520e+00]
 [  3.15507571e+02   3.27824837e+03  -2.18600000e+00]
 [  3.19601448e+02   3.27198482e+03  -1.69771040e+00]
 [  3.05286609e+02   3.25691985e+03  -2.66697440e+00]
 [  3.01580345e+02   3.25622715e+03  -2.66209760e+00]
 [  3.02511985e+02   3.25794144e+03  -2.67855680e+00]
 [  3.01371515e+02   3.25342679e+03  -1.84249040e+00]
 [  3.00572376e+02   3.25445421e+03  -2.70446480e+00]
 [  3.01669594e+02   3.25676087e+03  -2.26220000e+00]
 [  3.02519127e+02   3.25346957e+03  -2.26220000e+00]
 [  3.05003400e+02   3.25737359e+03  -2.24726480e+00]
 [  3.04039856e+02   3.25785807e+03  -2.60967200e+00]
 [  3.05039933e+02   3.25685696e+03  -2.65173440e+00]
 [  3.04015052e+02   3.25476038e+03  -2.23476800e+00]
 [  3.05000290e+02   3.25470957e+03  -2.26037120e+00]
 [  3.01464646e+02   3.25791521e+03  -2.59290800e+00]
 [  3.04415035e+02   3.25351240e+03  -1.48800800e+00]
 [  3.05436737e+02   3.25702401e+03  -2.25915200e+00]
 [  3.02497966e+02   3.25370431e+03  -2.87149520e+00]
 [  3.05248617e+02   3.25693437e+03  -2.21648000e+00]
 [  3.07342867e+02   3.25343450e+03  -1.94216000e+00]
 [  3.06608725e+02   3.25721917e+03  -1.70502560e+00]
 [  3.06470284e+02   3.25445552e+03  -2.23994960e+00]]

终点坐标x,y,z如下所示:

 End=[[  3.10533447e+02   3.25765789e+03  -2.27015040e+00]
 [  3.10584658e+02   3.25754029e+03  -2.29941120e+00]
 [  3.09585711e+02   3.25767680e+03  -2.22869760e+00]
 [  3.10562101e+02   3.25782736e+03  -2.04063600e+00]
 [  3.09859006e+02   3.25502733e+03  -8.63193600e-01]
 [  3.10145280e+02   3.25772271e+03  -2.29514400e+00]
 [  3.09377680e+02   3.25656225e+03  -2.26009200e+00]
 [  3.09441338e+02   3.25729875e+03  -2.73984720e+00]
 [  3.09367962e+02   3.25589346e+03  -2.21223840e+00]
 [  3.12137910e+02   3.25641340e+03  -2.56794000e+00]
 [  3.11537281e+02   3.25692338e+03  -2.38932720e+00]
 [  3.10344406e+02   3.25743844e+03  -2.50515120e+00]
 [  3.12450645e+02   3.25649069e+03  -2.43870480e+00]
 [  3.12430410e+02   3.25651708e+03  -2.43870480e+00]
 [  3.09783327e+02   3.25504679e+03  -1.22438160e+00]
 [  3.10930506e+02   3.25541935e+03  -1.21980960e+00]
 [  3.10705650e+02   3.25555548e+03  -1.20548400e+00]
 [  3.10434863e+02   3.26062004e+03  -2.17596720e+00]
 [  3.10685407e+02   3.26089987e+03  -2.48137680e+00]
 [  3.10528861e+02   3.26062489e+03  -2.23967040e+00]
 [  3.13014726e+02   3.26306711e+03  -2.22564960e+00]
 [  3.11174753e+02   3.26033603e+03  -2.28020880e+00]
 [  3.10973636e+02   3.26008751e+03  -2.27898960e+00]
 [  3.12502547e+02   3.26147741e+03  -2.19882720e+00]
 [  3.12546812e+02   3.26150827e+03  -2.23052640e+00]
 [  3.11981100e+02   3.26050428e+03  -2.25552000e+00]
 [  3.11161877e+02   3.26064522e+03  -2.30215440e+00]
 [  3.11995539e+02   3.26172098e+03  -2.26801680e+00]
 [  3.10358342e+02   3.26270646e+03  -1.46791680e+00]
 [  3.09769621e+02   3.26045517e+03  -2.40974880e+00]
 [  3.11862741e+02   3.26203583e+03  -2.38201200e+00]
 [  3.10458161e+02   3.26017564e+03  -2.25552000e+00]
 [  3.10435541e+02   3.26025584e+03  -2.38719360e+00]
 [  3.10634345e+02   3.25532574e+03  -1.37922000e+00]
 [  3.10930506e+02   3.25541935e+03  -1.21980960e+00]
 [  3.10705650e+02   3.25555548e+03  -1.20548400e+00]
 [  3.17314631e+02   3.27439678e+03  -2.32013760e+00]
 [  3.16496076e+02   3.27350937e+03  -1.87543440e+00]
 [  3.18773285e+02   3.27683172e+03  -2.46888000e+00]
 [  3.18160936e+02   3.27274365e+03  -1.56758640e+00]
 [  3.15632108e+02   3.27233074e+03  -1.64531040e+00]
 [  3.18633806e+02   3.27221131e+03  -1.83763920e+00]
 [  3.17134081e+02   3.27118375e+03  -2.01716640e+00]
 [  3.16833043e+02   3.27144651e+03  -2.24759520e+00]
 [  3.13386873e+02   3.27257986e+03  -2.90382960e+00]
 [  3.13389504e+02   3.27256319e+03  -3.89961120e+00]
 [  3.10928864e+02   3.26062972e+03  -2.38262160e+00]
 [  3.11981100e+02   3.26050428e+03  -2.25552000e+00]
 [  3.11161877e+02   3.26064522e+03  -2.30215440e+00]
 [  3.11995539e+02   3.26172098e+03  -2.26801680e+00]
 [  3.10358342e+02   3.26270646e+03  -1.46791680e+00]
 [  3.09769621e+02   3.26045517e+03  -2.40974880e+00]
 [  3.11862741e+02   3.26203583e+03  -2.38201200e+00]
 [  3.10458161e+02   3.26017564e+03  -2.25552000e+00]
 [  3.10435541e+02   3.26025584e+03  -2.38719360e+00]
 [  3.07297341e+02   3.25538852e+03  -1.19207280e+00]
 [  3.06746854e+02   3.25700712e+03  -2.32013760e+00]
 [  3.08296526e+02   3.25520303e+03  -2.46095520e+00]
 [  3.08912102e+02   3.25660281e+03  -1.25181360e+00]
 [  3.05497596e+02   3.25525977e+03  -1.25181360e+00]
 [  3.06817203e+02   3.25677321e+03  -2.78404320e+00]
 [  3.06817919e+02   3.25680380e+03  -2.80659840e+00]
 [  3.07587855e+02   3.25573025e+03  -8.75385600e-01]
 [  3.07538448e+02   3.25654594e+03  -2.30611680e+00]
 [  3.07360464e+02   3.25606283e+03  -8.85748800e-01]
 [  3.08994093e+02   3.25596928e+03  -1.93639440e+00]
 [  3.08823412e+02   3.25584599e+03  -1.92328800e+00]
 [  3.05608725e+02   3.25521917e+03  -1.80502560e+00]
 [  3.08470284e+02   3.25545552e+03  -2.33994960e+00]
 [  3.17844699e+02   3.28055840e+03  -3.20040000e+00]
 [  3.18818319e+02   3.27789903e+03  -2.45577360e+00]
 [  3.17928996e+02   3.28001237e+03  -1.89250320e+00]
 [  3.18798820e+02   3.27994784e+03  -2.30855520e+00]
 [  3.17507571e+02   3.28024837e+03  -2.28600000e+00]
 [  3.17601448e+02   3.27398482e+03  -1.79771040e+00]
 [  3.03286609e+02   3.25591985e+03  -2.76697440e+00]
 [  3.03580345e+02   3.25522715e+03  -2.76209760e+00]
 [  3.03511985e+02   3.25594144e+03  -2.77855680e+00]
 [  3.03371515e+02   3.25542679e+03  -1.94249040e+00]
 [  3.01572376e+02   3.25545421e+03  -2.80446480e+00]
 [  3.03669594e+02   3.25576087e+03  -2.36220000e+00]
 [  3.03519127e+02   3.25546957e+03  -2.36220000e+00]
 [  3.04003400e+02   3.25537359e+03  -2.34726480e+00]
 [  3.03039856e+02   3.25585807e+03  -2.70967200e+00]
 [  3.03039933e+02   3.25585696e+03  -2.75173440e+00]
 [  3.03015052e+02   3.25576038e+03  -2.33476800e+00]
 [  3.04000290e+02   3.25570957e+03  -2.36037120e+00]
 [  3.03464646e+02   3.25591521e+03  -2.69290800e+00]
 [  3.03415035e+02   3.25551240e+03  -1.58800800e+00]
 [  3.03436737e+02   3.25602401e+03  -2.35915200e+00]
 [  3.03497966e+02   3.25570431e+03  -2.97149520e+00]
 [  3.03248617e+02   3.25593437e+03  -2.31648000e+00]
 [  3.08342867e+02   3.25543450e+03  -2.04216000e+00]
 [  3.05608725e+02   3.25521917e+03  -1.80502560e+00]
 [  3.08470284e+02   3.25545552e+03  -2.33994960e+00]]

我尝试的代码如下所示:

  from mpl_toolkits.mplot3d import Axes3D
  from sklearn.cluster import KMeans
  import matplotlib.pyplot as plt
  import numpy as np
  numberOfLines = len(Start)
  approximateLinesPerCentroid = (numberOfLines/10) +1
  kmeans = KMeans(n_clusters=approximateLinesPerCentroid, random_state=0).fit(Start)
  fig = plt.figure()
  ax = fig.add_subplot(111, projection='3d')
  for n in range(approximateLinesPerCentroid): 
     kmeans.cluster_centers_[n][2]=0
  for n in range(approximateLinesPerCentroid):
    ax.scatter(kmeans.cluster_centers_[n][0], kmeans.cluster_centers_[n][1], kmeans.cluster_centers_[n][2],color='green', marker='^',s=60)
  for line in range(numberOfLines):
    ax.plot([Start[line][0],End[line][0]], [Start[line][1],End[line][1]], [Start[line][2],End[line][2]], color='grey')
  ax.set_xlabel('X Label')
  ax.set_ylabel('Y Label')
  ax.set_zlabel('Z Label')
  for line in range(numberOfLines):
      centroid = kmeans.labels_[line]        
      CentroidToLabelMap[centroid].append(line)
  print CentroidToLabelMap
  plt.show()

所附图片显示了我尝试的代码的解决方案:Side View

Top View

代码中的CentroidToLabelMap定义为列表格式的列表中每一行对质心的赋值。每个列表代表附加到其质心的线。可以看出,一些质心需要超过 10 条线。

[[0, 1, 2, 3, 4, 6, 8, 10, 11, 12, 13, 14], [37, 38, 40, 41, 42, 43, 44, 75], [76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92], [21, 24, 25, 28, 29, 31, 50, 51, 53], [5, 15, 16, 17, 34, 35, 36], [39, 70, 71, 72, 73, 74], [56, 57, 60, 61, 62, 63, 64, 65, 68, 94], [18, 19, 20, 22, 23, 26, 27, 30, 32, 33, 47, 48, 49, 52, 54, 55], [45, 46], [7, 9, 58, 59, 66, 67, 69, 93, 95]]

常规 KMeans 不是这样工作的。不能对每个集群中的点数做出任何保证或限制。 See this relevant SO question

但是,"variation" 可以以这种方式工作。 See this。不过我之前没试过。

据我所知,现成的 scikit-learn 算法无法做到这一点。