如何绘制 sklearn kmeans 中的成本/惯性值?

How to plot the cost / inertia values in sklearn kmeans?

是否可以得出kmeans成本值? 我想根据 kmeans 的迭代绘制成本值,如下图

你能参考一些相关的话题吗?谢谢

Kmeans 中的惯性

通过 cost 我假设您想绘制 Kmeans 运行.

中发生的每次迭代的惯性值

The K-means algorithm aims to choose centroids that minimize the inertia, or within-cluster sum-of-squares criterion. Inertia can be recognized as a measure of how internally coherent clusters are.

这是 KMeans 尝试在每次迭代中最小化的内容。

更多详情here


为每次迭代打印惯性值

您可以在使用 kmeans.inertia_ 拟合 KMeans() 后获得最终的惯性值,但如果您想获得每次迭代的惯性值,一种方法是设置 verbose=2

def train_kmeans(X):
    kmeans = KMeans(n_clusters=5, verbose=2, n_init=1)
    kmeans.fit(X)
    return kmeans

X = np.random.random((1000,7))
train_kmeans(X)
Initialization complete
Iteration 0, inertia 545.5728914456803
Iteration 1, inertia 440.5225419317938
Iteration 2, inertia 431.87478970379755
Iteration 3, inertia 427.52125502838504
Iteration 4, inertia 425.75105209622967
Iteration 5, inertia 424.7788124997543
Iteration 6, inertia 424.2111904252263
Iteration 7, inertia 423.7217490965455
Iteration 8, inertia 423.29439165408354
Iteration 9, inertia 422.9243615021072
Iteration 10, inertia 422.54144662407566
Iteration 11, inertia 422.2677910840504
Iteration 12, inertia 421.98686844470336
Iteration 13, inertia 421.76289612029376
Iteration 14, inertia 421.59241427498324
Iteration 15, inertia 421.36516415785724
Iteration 16, inertia 421.23801796298704
Iteration 17, inertia 421.1065220191125
Iteration 18, inertia 420.85788031236586
Iteration 19, inertia 420.6053961581343
Iteration 20, inertia 420.4998816171483
Iteration 21, inertia 420.4436034595902
Iteration 22, inertia 420.39833211852346
Iteration 23, inertia 420.3583721574586
Iteration 24, inertia 420.32684273674226
Iteration 25, inertia 420.2786269304449
Iteration 26, inertia 420.24149714604516
Iteration 27, inertia 420.22255866139835
Iteration 28, inertia 420.2075247585145
Iteration 29, inertia 420.19985517233584
Iteration 30, inertia 420.18983415887305
Iteration 31, inertia 420.18584733421886
Converged at iteration 31: center shift 8.716337631121295e-33 within tolerance 8.370287188573764e-06

注意: KMeans 多次重新初始化其质心,每次初始化 运行 最多 max_iters。对于单个惯性值列表,您必须设置 n_iter=1 以确保在拟合模型期间进行单一初始化。如果将 n_iter 设置为更高的值,您将在打印输出中看到多个初始化和迭代列表。


绘制每次迭代的惯性值

问题是,(据我所知)无法使用 sklearn 中的参数将这些惯性值存储到变量中。因此,您可能需要围绕它编写一个包装器 redirect the verbose stdout into an output variable as text,然后为每次迭代提取惯性值。

您可以使用 StringIOverbose=2 捕获此打印输出,提取并绘图。

这里是完整的代码-

import io
import sys
import numpy as np
from sklearn.cluster import KMeans
import matplotlib.pyplot as plt

#Dummy data
X = np.random.random((1000,7)) 

def train_kmeans(X):
    kmeans = KMeans(n_clusters=5, verbose=2, n_init=1) #<-- init=1, verbose=2
    kmeans.fit(X)
    return kmeans

#HELPER FUNCTION
#Takes the returned and printed output of a function and returns it as variables
#In this case, the returned output is the model and printed is the verbose intertia at each iteration

def redirect_wrapper(f, inp):
    old_stdout = sys.stdout
    new_stdout = io.StringIO()
    sys.stdout = new_stdout

    returned = f(inp)                #<- Call function
    printed = new_stdout.getvalue()  #<- store printed output

    sys.stdout = old_stdout
    return returned, printed


returned, printed = redirect_wrapper(train_kmeans, X)

#Extract inertia values
inertia = [float(i[i.find('inertia')+len('inertia')+1:]) for i in printed.split('\n')[1:-2]]

#Plot!
plt.plot(inertia)


编辑: 我已经更新了我的答案以编写一个调用给定函数(returns 并打印一些东西)和 returns 的通用辅助函数其打印数据和返回数据分开。在这种情况下,返回模型并将打印的内容作为文本存储在变量中。