如何检测二维阵列中的曲线形簇? Python

How to detect curve shaped clusters in 2D array? Python

在二维数据集中检测曲线的最聪明方法是什么?必须有一种方法可以通过定义到邻居的最大距离来对数据点进行聚类。我的目标是在每条曲线上应用 polyfit 函数并将此模板用于相似的数据集。

数据示例:

array([[ 0., 0., 0., ..., 2020., 2020., 2020.], [ 51., 76., 194., ..., 1862., 1915., 2021.]])

发现这可以通过凝聚聚类来完成,这是代码和结果:

from sklearn.cluster import AgglomerativeClustering

#Reshape data

a = array[:, 0].flatten()
b = array[:, 1].flatten()

array_new = np.matrix([a,b])
array_new = np.squeeze(np.asarray(array_new))

array_new1 = array_new.T

#Clustering algorithm

n_clusters = None
model = AgglomerativeClustering(n_clusters=n_clusters,
                                affinity='euclidean', 
                                linkage='single',
                                compute_full_tree=True,
                                distance_threshold=15) 
model.fit(array_new1)
labels = model.labels_
n_clusters = len(list(set(labels)))
print(n_clusters)

cmap = plt.get_cmap('rainbow')
colors = [cmap(i) for i in np.linspace(0, 1, n_clusters)]

plt.figure(figsize=(15,15))
for i, color in enumerate(colors, start=1):
    plt.scatter(array_new1[labels==i,0], array_new1[labels==i,1], color=color)
plt.gca().invert_yaxis()
plt.show()

![](https://i.stack.imgur.com/utwqP.png)

#plotting result

data = pd.DataFrame({'x' : array_new1[:,0],
                     'y' : array_new1[:,1],
                     'label' : labels})

data.sort_values(by='label')

counter = 0
plt.figure(figsize=(15,15))
plt.scatter(5*array[:, 0], array[:, 1])
for i in range(n_clusters):

    if len(data.loc[data['label'] == i].iloc[:,0]) > 50 \
    and len(data.loc[data['label'] == i].iloc[:,0]) < 1000:

            counter += 1

            z = np.polyfit(data.loc[data['label'] == i].iloc[:,0], 
                            data.loc[data['label'] == i].iloc[:,1],
                              2)

            p = np.poly1d(z)
            xp = np.linspace(0, tasku_sk, 50)

            #plt.scatter(data.loc[data['label'] == i].iloc[:,0], 
            #            data.loc[data['label'] == i].iloc[:,1])
            plt.plot(5*xp, p(xp), c='r', lw=4)

plt.gca().invert_yaxis()
plt.show()

print(counter)

![](https://i.stack.imgur.com/AQHOf.png)

22

是的。

据称是所有聚类算法中最古老的算法:single-link.