无法为一维数据绘制 K-Means 聚类
Cannot plot K-Means clusters for one-dimensional data
我正在尝试在我的二元分类任务中实施 K-Means 算法,但我无法绘制生成的两个聚类的散点图。
我的数据集只是以下形式:
# size, class
312, 1
319 1
227 0
最小的例子:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.cluster import KMeans
X = {'size': [312,319,227,301,273,311,277,291,303,381], 'class': [1,1,0,1,0,1,0,0,1,1]}
X = pd.DataFrame(data=X)
X_train, X_test, y_train, y_test = train_test_split(X['size'], X['class'], test_size=0.4)
X_train = X_train.values.reshape(-1,1)
X_test = X_test.values.reshape(-1,1)
kmeans = KMeans(init="k-means++", n_clusters=2, n_init=10, max_iter=300, random_state=42)
kmeans.fit(X_train)
preds = kmeans.predict(X_test)
如何根据预测“preds”绘制显示两个聚类、“X_test”中的样本和相应颜色(0 和 1)的散点图?
因为你只有一个功能,所以你所有的数据都在一条线上。您可以像这样创建散点图:
color = ["blue", "red"]
plt.scatter(X_test.flatten(), [0]*len(X_test), c=[color[p] for p in preds])
如果你想有两个特征,你可以修改你的数据:
X = {
'size_1': [312,319,227,301,273,311,277,291,303,381],
'size_2': [152,165,301,145,310,145,315,156,160,165],
'class': [1,1,0,1,0,1,0,0,1,1],
}
X = pd.DataFrame(data=X)
X_train, X_test, y_train, y_test = train_test_split(X[['size_1', 'size_2']], X['class'], test_size=0.4)
然后你修改散点图:
plt.scatter(X_test.iloc[:,0],X_test.iloc[:,1], c=[color[p] for p in preds])
我正在尝试在我的二元分类任务中实施 K-Means 算法,但我无法绘制生成的两个聚类的散点图。
我的数据集只是以下形式:
# size, class
312, 1
319 1
227 0
最小的例子:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.cluster import KMeans
X = {'size': [312,319,227,301,273,311,277,291,303,381], 'class': [1,1,0,1,0,1,0,0,1,1]}
X = pd.DataFrame(data=X)
X_train, X_test, y_train, y_test = train_test_split(X['size'], X['class'], test_size=0.4)
X_train = X_train.values.reshape(-1,1)
X_test = X_test.values.reshape(-1,1)
kmeans = KMeans(init="k-means++", n_clusters=2, n_init=10, max_iter=300, random_state=42)
kmeans.fit(X_train)
preds = kmeans.predict(X_test)
如何根据预测“preds”绘制显示两个聚类、“X_test”中的样本和相应颜色(0 和 1)的散点图?
因为你只有一个功能,所以你所有的数据都在一条线上。您可以像这样创建散点图:
color = ["blue", "red"]
plt.scatter(X_test.flatten(), [0]*len(X_test), c=[color[p] for p in preds])
如果你想有两个特征,你可以修改你的数据:
X = {
'size_1': [312,319,227,301,273,311,277,291,303,381],
'size_2': [152,165,301,145,310,145,315,156,160,165],
'class': [1,1,0,1,0,1,0,0,1,1],
}
X = pd.DataFrame(data=X)
X_train, X_test, y_train, y_test = train_test_split(X[['size_1', 'size_2']], X['class'], test_size=0.4)
然后你修改散点图:
plt.scatter(X_test.iloc[:,0],X_test.iloc[:,1], c=[color[p] for p in preds])