获取每个簇中的元素
Get element in each cluster
我有以下代码,它从 csv 文件中提取 2 个特征(tempo 和 slotID),并根据这 2 个特征绘制 kmeans 聚类图。
df = pd.read_csv("prova.csv", encoding = "ISO-8859-1", sep = ';')
dfSlotMean = df.groupby('slotID', as_index=False)['tempo'].mean()
df = dfSlotMean[['tempo','slotID']]
###############################################
Sum_of_squared_distances = []
K = range(1,15)
for k in K:
km = KMeans(n_clusters=k)
km = km.fit(df)
Sum_of_squared_distances.append(km.inertia_)
plt.plot(K, Sum_of_squared_distances, 'bx-')
plt.xlabel('k')
plt.ylabel('Sum_of_squared_distances')
plt.title('Elbow Method For Optimal k')
plt.show()
################################################
kmeans = KMeans(n_clusters=5).fit(df)
labels = kmeans.labels_
centroids = kmeans.cluster_centers_
print(centroids)
# plt.scatter(df['tempo'], df['slotID'], c= kmeans.labels_.astype(float), s=25, alpha=0.5)
# plt.scatter(centroids[:, 0], centroids[:, 1], c='red', s=50)
# plt.show()
plt.scatter(df['slotID'], df['tempo'], c= kmeans.labels_.astype(float), s=25, alpha=0.5)
plt.title('Martedì')
plt.scatter(centroids[:, 1], centroids[:, 0], c='red', s=25)
plt.show()
print(pd.Series(labels).value_counts())
我现在要做的是获取每个集群中的值分配。我怎样才能做到这一点?
这是代码的输出:
简而言之,我想要例如属于簇号1的点是:131,98; 135,76 秒...
使用数据帧索引获取所需数据。例如,如果您想要来自簇 1 的点,您可以使用
获取它们
df[labels == 1]
如果你想得到它们:
for i in np.unique(labels):
print(df[labels == i])
for i in np.unique(labels):
print(df[labels == i])
这个 works.but 你能解释下面两行.. 为什么使用编码,sep。为什么开始时需要槽均值?
df = pd.read_csv("prova.csv", encoding = "ISO-8859-1", sep = ';')
dfSlotMean = df.groupby('slotID', as_index=False)['tempo'].mean()
我有以下代码,它从 csv 文件中提取 2 个特征(tempo 和 slotID),并根据这 2 个特征绘制 kmeans 聚类图。
df = pd.read_csv("prova.csv", encoding = "ISO-8859-1", sep = ';')
dfSlotMean = df.groupby('slotID', as_index=False)['tempo'].mean()
df = dfSlotMean[['tempo','slotID']]
###############################################
Sum_of_squared_distances = []
K = range(1,15)
for k in K:
km = KMeans(n_clusters=k)
km = km.fit(df)
Sum_of_squared_distances.append(km.inertia_)
plt.plot(K, Sum_of_squared_distances, 'bx-')
plt.xlabel('k')
plt.ylabel('Sum_of_squared_distances')
plt.title('Elbow Method For Optimal k')
plt.show()
################################################
kmeans = KMeans(n_clusters=5).fit(df)
labels = kmeans.labels_
centroids = kmeans.cluster_centers_
print(centroids)
# plt.scatter(df['tempo'], df['slotID'], c= kmeans.labels_.astype(float), s=25, alpha=0.5)
# plt.scatter(centroids[:, 0], centroids[:, 1], c='red', s=50)
# plt.show()
plt.scatter(df['slotID'], df['tempo'], c= kmeans.labels_.astype(float), s=25, alpha=0.5)
plt.title('Martedì')
plt.scatter(centroids[:, 1], centroids[:, 0], c='red', s=25)
plt.show()
print(pd.Series(labels).value_counts())
我现在要做的是获取每个集群中的值分配。我怎样才能做到这一点? 这是代码的输出:
简而言之,我想要例如属于簇号1的点是:131,98; 135,76 秒...
使用数据帧索引获取所需数据。例如,如果您想要来自簇 1 的点,您可以使用
获取它们df[labels == 1]
如果你想得到它们:
for i in np.unique(labels):
print(df[labels == i])
for i in np.unique(labels):
print(df[labels == i])
这个 works.but 你能解释下面两行.. 为什么使用编码,sep。为什么开始时需要槽均值?
df = pd.read_csv("prova.csv", encoding = "ISO-8859-1", sep = ';')
dfSlotMean = df.groupby('slotID', as_index=False)['tempo'].mean()