K_means 聚类中的这些代码行是什么意思?

What does these lines of codes in K_means clustering means?

我正在学习 K 均值聚类。并且对 plt.scatter(X[y_kmeans == 0, 0], X[y_kmeans == 0, 1], s = 100, c = 'red', label = 'Cluster 1') 的工作感到很困惑代码中 X[y_kmeans == 0, 0], X[y_kmeans == 0, 1] 的目的是什么?

完整代码在这里

#k-means

#importing libraries

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

#importing the dataset
dataset = pd.read_csv("mall_customers.csv")
X = dataset.iloc[:,[3,4]].values

#using the elbow method to find the optimal number of clusters
from sklearn.cluster import KMeans
wcss = [] #Within-Cluster Sum of Square

for i in range(1,11):
    kmeans = KMeans(n_clusters = i, init = 'k-means++',max_iter = 300,n_init=10,random_state = 0)
    kmeans.fit(X)
    wcss.append(kmeans.inertia_)

plt.plot(range(1,11),wcss)
plt.title("The elbow method")
plt.xlabel("Number of cluster")
plt.ylabel('Wcss') 
plt.show()    

#applying kmeans to all dataset
kmeans = KMeans(n_clusters = 5,init = 'k-means++', max_iter=300,n_init=10,random_state=0)
y_kmeans = kmeans.fit_predict(X)

#Visualising the cluster
plt.scatter(X[y_kmeans == 0,0],X[y_kmeans == 0,1],s=100,c = 'red' ,label='Cluster1')
plt.scatter(X[y_kmeans == 1,0],X[y_kmeans == 1,1],s=100,c='blue', label='Cluster2')
plt.scatter(X[y_kmeans == 2,0],X[y_kmeans == 2,1],s=100,c='green',label='Cluster3')
plt.scatter(X[y_kmeans == 3,0],X[y_kmeans == 3,1],s=100, c ='cyan',label = 'CLuster4')
plt.scatter(X[y_kmeans == 4, 0], X[y_kmeans == 4, 1], s = 100, c = 'magenta', label = 'Cluster 5')
plt.scatter(kmeans.cluster_centers_[:,0],kmeans.cluster_centers_[:,1],s=300, c = 'yellow', label ='Centroids')

plt.title('Clusters of customers')
plt.xlabel('Annual Income (k$)')
plt.ylabel('Spending Score (1-100)')
plt.legend()
plt.show()

我添加了输出图像以供参考 elbow graph, Final cluster image

这是一个过滤器。 y_kmeans == 0选择那些y_kmeans[i]等于0的元素。X[y_kmeans == 0, 0]选择X中对应y_kmeans值为0且第二维为0的元素。

最初由 tim roberts

回答

X[y_hc ==1,0] 这里 0 表示模型在 x 平面中 X[y_hc == 0,1] 表示模型在 y 平面中。 其中 1 指的是 [i] 的值或簇值。