获取每个簇中的元素

Get element in each cluster

我有以下代码,它从 csv 文件中提取 2 个特征(tempo 和 slotID),并根据这 2 个特征绘制 kmeans 聚类图。

df = pd.read_csv("prova.csv", encoding = "ISO-8859-1", sep = ';')
dfSlotMean = df.groupby('slotID', as_index=False)['tempo'].mean()

df = dfSlotMean[['tempo','slotID']]

###############################################
Sum_of_squared_distances = []

K = range(1,15)
for k in K:
    km = KMeans(n_clusters=k)
    km = km.fit(df)
    Sum_of_squared_distances.append(km.inertia_)


plt.plot(K, Sum_of_squared_distances, 'bx-')
plt.xlabel('k')
plt.ylabel('Sum_of_squared_distances')
plt.title('Elbow Method For Optimal k')
plt.show()
################################################

kmeans = KMeans(n_clusters=5).fit(df)
labels = kmeans.labels_
centroids = kmeans.cluster_centers_
print(centroids)

# plt.scatter(df['tempo'], df['slotID'], c= kmeans.labels_.astype(float), s=25, alpha=0.5)
# plt.scatter(centroids[:, 0], centroids[:, 1], c='red', s=50)
# plt.show()

plt.scatter(df['slotID'], df['tempo'], c= kmeans.labels_.astype(float), s=25, alpha=0.5)
plt.title('Martedì')
plt.scatter(centroids[:, 1], centroids[:, 0], c='red', s=25)
plt.show()
print(pd.Series(labels).value_counts())

我现在要做的是获取每个集群中的值分配。我怎样才能做到这一点? 这是代码的输出:

简而言之,我想要例如属于簇号1的点是:131,98; 135,76 秒...

使用数据帧索引获取所需数据。例如,如果您想要来自簇 1 的点,您可以使用

获取它们
df[labels == 1]

如果你想得到它们:

for i in np.unique(labels):
    print(df[labels == i])
for i in np.unique(labels):
    print(df[labels == i])

这个 works.but 你能解释下面两行.. 为什么使用编码,sep。为什么开始时需要槽均值?

df = pd.read_csv("prova.csv", encoding = "ISO-8859-1", sep = ';')
dfSlotMean = df.groupby('slotID', as_index=False)['tempo'].mean()