为 Python 中的每次迭代绘制 KMeans 聚类中心

Question

我创建了一个包含 6 个簇的数据集，并使用下面的代码将其可视化，并为每次迭代找到簇中心点，现在我想可视化演示 KMeans 算法中簇质心的更新。该演示应包括通过生成 2×2 轴图形进行的前四次迭代。我找到了这些点，但我无法绘制它们，你能看看我的代码并通过查看帮助我编写散点图的算法吗？

到目前为止，这是我的代码：

import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline
from sklearn.datasets import make_blobs
data = make_blobs(n_samples=200, n_features=8, 
                           centers=6, cluster_std=1.8,random_state=101)
data[0].shape
plt.scatter(data[0][:,0],data[0][:,1],c=data[1],cmap='brg')

plt.show()
from sklearn.cluster import KMeans

print("First iteration points:")
kmeans = KMeans(n_clusters=6,random_state=0,max_iter=1)
kmeans.fit(data[0])
centroids=kmeans.cluster_centers_
print(kmeans.cluster_centers_)
print("Second iteration points:")
kmeans = KMeans(n_clusters=6,random_state=0,max_iter=2)
kmeans.fit(data[0])
print(kmeans.cluster_centers_)
print("Third iteration points:")
kmeans = KMeans(n_clusters=6,random_state=0,max_iter=3)
kmeans.fit(data[0])
print(kmeans.cluster_centers_)
print("Forth iteration points:")
kmeans = KMeans(n_clusters=6,random_state=0,max_iter=4)
kmeans.fit(data[0])
print(kmeans.cluster_centers_)

Answer 1

您可以使用 plt.scatter() 和 plt.subplots() 实现如下：

import matplotlib.pyplot as plt
from sklearn.datasets import make_blobs
data = make_blobs(n_samples=200, n_features=8, 
                           centers=6, cluster_std=1.8,random_state=101)

fig, ax = plt.subplots(nrows=2, ncols=2,figsize=(10,10))

from sklearn.cluster import KMeans
c=d=0
for i in range(4):
    ax[c,d].title.set_text(f"{i+1} iteration points:")
    kmeans = KMeans(n_clusters=6,random_state=0,max_iter=i+1)
    kmeans.fit(data[0])
    centroids=kmeans.cluster_centers_
    ax[c,d].scatter(data[0][:,0],data[0][:,1],c=data[1],cmap='brg')
    ax[c,d].scatter(kmeans.cluster_centers_[:, 0], kmeans.cluster_centers_[:, 1], s=200, c='black')
    d+=1
    if d==2:
        c+=1
        d=0

这将产生：

为 Python 中的每次迭代绘制 KMeans 聚类中心

Plotting the KMeans Cluster Centers for every iteration in Python

python

cluster-analysis

data-mining

scikit-learn

data-science