如何检测二维阵列中的曲线形簇? Python
How to detect curve shaped clusters in 2D array? Python
在二维数据集中检测曲线的最聪明方法是什么?必须有一种方法可以通过定义到邻居的最大距离来对数据点进行聚类。我的目标是在每条曲线上应用 polyfit 函数并将此模板用于相似的数据集。
数据示例:
array([[ 0., 0., 0., ..., 2020., 2020., 2020.],
[ 51., 76., 194., ..., 1862., 1915., 2021.]])
发现这可以通过凝聚聚类来完成,这是代码和结果:
from sklearn.cluster import AgglomerativeClustering
#Reshape data
a = array[:, 0].flatten()
b = array[:, 1].flatten()
array_new = np.matrix([a,b])
array_new = np.squeeze(np.asarray(array_new))
array_new1 = array_new.T
#Clustering algorithm
n_clusters = None
model = AgglomerativeClustering(n_clusters=n_clusters,
affinity='euclidean',
linkage='single',
compute_full_tree=True,
distance_threshold=15)
model.fit(array_new1)
labels = model.labels_
n_clusters = len(list(set(labels)))
print(n_clusters)
cmap = plt.get_cmap('rainbow')
colors = [cmap(i) for i in np.linspace(0, 1, n_clusters)]
plt.figure(figsize=(15,15))
for i, color in enumerate(colors, start=1):
plt.scatter(array_new1[labels==i,0], array_new1[labels==i,1], color=color)
plt.gca().invert_yaxis()
plt.show()
![](https://i.stack.imgur.com/utwqP.png)
#plotting result
data = pd.DataFrame({'x' : array_new1[:,0],
'y' : array_new1[:,1],
'label' : labels})
data.sort_values(by='label')
counter = 0
plt.figure(figsize=(15,15))
plt.scatter(5*array[:, 0], array[:, 1])
for i in range(n_clusters):
if len(data.loc[data['label'] == i].iloc[:,0]) > 50 \
and len(data.loc[data['label'] == i].iloc[:,0]) < 1000:
counter += 1
z = np.polyfit(data.loc[data['label'] == i].iloc[:,0],
data.loc[data['label'] == i].iloc[:,1],
2)
p = np.poly1d(z)
xp = np.linspace(0, tasku_sk, 50)
#plt.scatter(data.loc[data['label'] == i].iloc[:,0],
# data.loc[data['label'] == i].iloc[:,1])
plt.plot(5*xp, p(xp), c='r', lw=4)
plt.gca().invert_yaxis()
plt.show()
print(counter)
![](https://i.stack.imgur.com/AQHOf.png)
22
是的。
据称是所有聚类算法中最古老的算法:single-link.
在二维数据集中检测曲线的最聪明方法是什么?必须有一种方法可以通过定义到邻居的最大距离来对数据点进行聚类。我的目标是在每条曲线上应用 polyfit 函数并将此模板用于相似的数据集。
数据示例:
array([[ 0., 0., 0., ..., 2020., 2020., 2020.], [ 51., 76., 194., ..., 1862., 1915., 2021.]])
发现这可以通过凝聚聚类来完成,这是代码和结果:
from sklearn.cluster import AgglomerativeClustering
#Reshape data
a = array[:, 0].flatten()
b = array[:, 1].flatten()
array_new = np.matrix([a,b])
array_new = np.squeeze(np.asarray(array_new))
array_new1 = array_new.T
#Clustering algorithm
n_clusters = None
model = AgglomerativeClustering(n_clusters=n_clusters,
affinity='euclidean',
linkage='single',
compute_full_tree=True,
distance_threshold=15)
model.fit(array_new1)
labels = model.labels_
n_clusters = len(list(set(labels)))
print(n_clusters)
cmap = plt.get_cmap('rainbow')
colors = [cmap(i) for i in np.linspace(0, 1, n_clusters)]
plt.figure(figsize=(15,15))
for i, color in enumerate(colors, start=1):
plt.scatter(array_new1[labels==i,0], array_new1[labels==i,1], color=color)
plt.gca().invert_yaxis()
plt.show()
![](https://i.stack.imgur.com/utwqP.png)
#plotting result
data = pd.DataFrame({'x' : array_new1[:,0],
'y' : array_new1[:,1],
'label' : labels})
data.sort_values(by='label')
counter = 0
plt.figure(figsize=(15,15))
plt.scatter(5*array[:, 0], array[:, 1])
for i in range(n_clusters):
if len(data.loc[data['label'] == i].iloc[:,0]) > 50 \
and len(data.loc[data['label'] == i].iloc[:,0]) < 1000:
counter += 1
z = np.polyfit(data.loc[data['label'] == i].iloc[:,0],
data.loc[data['label'] == i].iloc[:,1],
2)
p = np.poly1d(z)
xp = np.linspace(0, tasku_sk, 50)
#plt.scatter(data.loc[data['label'] == i].iloc[:,0],
# data.loc[data['label'] == i].iloc[:,1])
plt.plot(5*xp, p(xp), c='r', lw=4)
plt.gca().invert_yaxis()
plt.show()
print(counter)
![](https://i.stack.imgur.com/AQHOf.png)
22
是的。
据称是所有聚类算法中最古老的算法:single-link.