散点图中数据大小的问题
problem with size of the data in scatter plot
我想将 Kmeans 应用于批发客户数据,网址为:
https://archive.ics.uci.edu/ml/datasets/wholesale+customers
到目前为止我的代码如下:
import pandas as pd
import numpy as np
from sklearn.preprocessing import MinMaxScaler
from sklearn.cluster import KMeans
import matplotlib.pyplot as plt
data = pd.read_csv('Wholesale customers data.csv')
cont_features = ['Fresh', 'Milk', 'Grocery', 'Frozen', 'Detergents_Paper', 'Delicassen']
dataS=data[cont_features]
mms = MinMaxScaler()
mms.fit(dataS)
data_norm = mms.transform(dataS)
dataNorm=pd.DataFrame(data_norm,columns=cont_features)
kmeans = KMeans(n_clusters=5).fit(data)
centroids = kmeans.cluster_centers_
labels = kmeans.predict(data)
data=data.iloc[:,[3,4]].values #only to select two features for visualizing the scatter plot
plt.scatter(data[labels==0, 0], data[labels==0, 1], s=10, c='red', label ='Cluster 1')
plt.scatter(data[labels==1, 0], data[labels==1, 1], s=10, c='blue', label ='Cluster 2')
plt.scatter(data[labels==2, 0], data[labels==2, 1], s=10, c='green', label ='Cluster 3')
plt.scatter(data[labels==3, 0], data[labels==3, 1], s=10, c='cyan', label ='Cluster 4')
plt.scatter(data[labels==4, 0], data[labels==3, 1], s=10, c='cyan', label ='Cluster 5')
plt.scatter(centroids[:, 0], centroids[:, 1], s=10, c='yellow', label = 'Centroids')
plt.title('Clusters')
plt.xlabel('Frozen')
plt.ylabel('Detergent')
plt.show()
问题是当我想 运行 我的代码时,出现的错误如下:
x and y must be the same size
我的剧情是这样的:
我找不到错误。有帮助吗?
plt.scatter(data[labels==4, 0], data[labels==4, 1], s=10, c='cyan', label ='Cluster 5')
这里是标签部分的3。
你的阴谋
plt.scatter(data[labels==3, 0], data[labels==4, 1], s=10, c='cyan', label ='Cluster 4')
plt.scatter(data[labels==4, 0], data[labels==3, 1], s=10, c='cyan', label ='Cluster 5')
# also here you are using same colors
plt.legend() # also you forgot that
我想将 Kmeans 应用于批发客户数据,网址为:
https://archive.ics.uci.edu/ml/datasets/wholesale+customers
到目前为止我的代码如下:
import pandas as pd
import numpy as np
from sklearn.preprocessing import MinMaxScaler
from sklearn.cluster import KMeans
import matplotlib.pyplot as plt
data = pd.read_csv('Wholesale customers data.csv')
cont_features = ['Fresh', 'Milk', 'Grocery', 'Frozen', 'Detergents_Paper', 'Delicassen']
dataS=data[cont_features]
mms = MinMaxScaler()
mms.fit(dataS)
data_norm = mms.transform(dataS)
dataNorm=pd.DataFrame(data_norm,columns=cont_features)
kmeans = KMeans(n_clusters=5).fit(data)
centroids = kmeans.cluster_centers_
labels = kmeans.predict(data)
data=data.iloc[:,[3,4]].values #only to select two features for visualizing the scatter plot
plt.scatter(data[labels==0, 0], data[labels==0, 1], s=10, c='red', label ='Cluster 1')
plt.scatter(data[labels==1, 0], data[labels==1, 1], s=10, c='blue', label ='Cluster 2')
plt.scatter(data[labels==2, 0], data[labels==2, 1], s=10, c='green', label ='Cluster 3')
plt.scatter(data[labels==3, 0], data[labels==3, 1], s=10, c='cyan', label ='Cluster 4')
plt.scatter(data[labels==4, 0], data[labels==3, 1], s=10, c='cyan', label ='Cluster 5')
plt.scatter(centroids[:, 0], centroids[:, 1], s=10, c='yellow', label = 'Centroids')
plt.title('Clusters')
plt.xlabel('Frozen')
plt.ylabel('Detergent')
plt.show()
问题是当我想 运行 我的代码时,出现的错误如下:
x and y must be the same size
我的剧情是这样的:
我找不到错误。有帮助吗?
plt.scatter(data[labels==4, 0], data[labels==4, 1], s=10, c='cyan', label ='Cluster 5')
这里是标签部分的3。
你的阴谋
plt.scatter(data[labels==3, 0], data[labels==4, 1], s=10, c='cyan', label ='Cluster 4')
plt.scatter(data[labels==4, 0], data[labels==3, 1], s=10, c='cyan', label ='Cluster 5')
# also here you are using same colors
plt.legend() # also you forgot that