更改 Kmean 簇的标签名称

Question

我在 python 中通过 sklearn 进行 kmean 聚类。我想知道如何更改为 kmean 集群生成的标签名称。例如：

data          Cluster
0.2344         1
1.4537         2
2.4428         2
5.7757         3

我想达到

data          Cluster
0.2344         black
1.4537         red
2.4428         red
5.7757         blue

我不是说直接打印设置1 -> black; 2 -> red。我想知道是否可以在默认情况下在 kmean 聚类模型中设置不同的聚类名称。

Answer 1

否
无法更改默认标签。
您必须使用字典分别映射它们。您可以查看文档 here.
中所有可用的方法 None 个可用方法或属性允许您更改默认标签。

使用字典的解决方案：

# Code
a = [0,0,1,1,2,2]
mapping = {0:'black', 1:'red', 2:'blue'}
a = [mapping[i] for i in a]

# Output
['black', 'black', 'red', 'red', 'blue', 'blue']

如果您更改数据或集群数量：首先我们将看到可视化效果：
代码：
导入并生成随机数据：

from sklearn.cluster import KMeans
import numpy as np
import matplotlib.pyplot as plt

x = np.random.uniform(100, size =(10,2))

应用 Kmeans 算法

kmeans = KMeans(n_clusters=3, random_state=0).fit(x)

获取聚类中心

arr = kmeans.cluster_centers_

您的集群质心如下所示：

array([[23.81072765, 77.21281171],
       [ 8.6140551 , 23.15597377],
       [93.37177176, 32.21581703]])

此处，第 1 行是聚类 0 的质心，第 2 行是聚类 1 的质心，依此类推。

可视化质心和数据:

plt.scatter(x[:,0],x[:,1])
plt.scatter(arr[:,0], arr[:,1])

您会得到一个如下所示的图表： .

如您所见，您可以访问质心和训练数据。如果你的训练数据和集群数量是恒定的，这些质心就不会真正改变。

但是如果你添加更多的训练数据或更多的集群，那么你将不得不根据生成的质心创建新的映射。

Answer 2

查看对此 related post

的最高回复

sklearn 不包含此功能，但您可以以相当直接的方式将值映射到数据框。

current_labels = [1, 2, 3]
desired_labels = ['black', 'red', 'blue']
# create a dictionary for your corresponding values
map_dict = dict(zip(current_labels, desired_labels))
map_dict
>>> {1: 'black', 2: 'red', 3: 'blue'}

# map the desired values back to the dataframe
# note this will replace the original values
data['Cluster'] = data['Cluster'].map(map_dict)

# alternatively you can map to a new column if you want to preserve the old values
data['NewNames'] = data['Cluster'].map(map_dict)

更改 Kmean 簇的标签名称

Changing label names of Kmean clusters

cluster-analysis

machine-learning

k-means

scikit-learn