如何使用 python 直观地比较集群?
How to visually compare clusters using python?
我正在研究用于客户细分的 k 均值聚类。我的输入数据有 12 个特征和 7315 行。
因此,我尝试了下面的代码来执行 k-means
kmeans = KMeans(n_clusters = 5, init = "k-means++", random_state = 42)
data_normalized['y_kmeans'] = kmeans.fit_predict(data_normalized)
为了可视化,我尝试了下面的代码
u_labels = np.unique(data_normalized['y_kmeans'])
#plotting the results:
for i in u_labels:
plt.scatter(data_normalized[y_kmeans == i , 0] , data_normalized[y_kmeans == i , 1] , label = i)
plt.legend()
plt.show()
我收到如下错误
TypeError: '(array([False, False, False, ..., False, False, False]), 0)' is an invalid key
InvalidIndexError: (array([False, False, False, ..., False, False, False]), 0)
如何可视化我的集群以查看它们彼此之间的距离?
由于我没有你的数据集,我模拟了你的数据框如下:
(我假设有 9 个不同的集群组)
d={'col1': [i/100 for i in random.choices(range(1,100), k=7315)],
'col2':[i/100 for i in random.choices(range(1,100), k=7315)],
'y_kmeans':random.choices(range(1,10), k=7315)}
data_normalized = pd.DataFrame(d)
之后您可以按如下方式绘制集群,
import numpy as np
import random
import pandas as pd
import matplotlib.pyplot as plt
u_labels = np.unique(data_normalized['y_kmeans']).tolist()
scatter = plt.scatter(data_normalized['col1'], data_normalized['col2'],
c=data_normalized['y_kmeans'], cmap='tab20')
plt.legend(handles=scatter.legend_elements()[0], labels=u_labels)
plt.show()
我得到以下聚类图
我正在研究用于客户细分的 k 均值聚类。我的输入数据有 12 个特征和 7315 行。
因此,我尝试了下面的代码来执行 k-means
kmeans = KMeans(n_clusters = 5, init = "k-means++", random_state = 42)
data_normalized['y_kmeans'] = kmeans.fit_predict(data_normalized)
为了可视化,我尝试了下面的代码
u_labels = np.unique(data_normalized['y_kmeans'])
#plotting the results:
for i in u_labels:
plt.scatter(data_normalized[y_kmeans == i , 0] , data_normalized[y_kmeans == i , 1] , label = i)
plt.legend()
plt.show()
我收到如下错误
TypeError: '(array([False, False, False, ..., False, False, False]), 0)' is an invalid key
InvalidIndexError: (array([False, False, False, ..., False, False, False]), 0)
如何可视化我的集群以查看它们彼此之间的距离?
由于我没有你的数据集,我模拟了你的数据框如下: (我假设有 9 个不同的集群组)
d={'col1': [i/100 for i in random.choices(range(1,100), k=7315)],
'col2':[i/100 for i in random.choices(range(1,100), k=7315)],
'y_kmeans':random.choices(range(1,10), k=7315)}
data_normalized = pd.DataFrame(d)
之后您可以按如下方式绘制集群,
import numpy as np
import random
import pandas as pd
import matplotlib.pyplot as plt
u_labels = np.unique(data_normalized['y_kmeans']).tolist()
scatter = plt.scatter(data_normalized['col1'], data_normalized['col2'],
c=data_normalized['y_kmeans'], cmap='tab20')
plt.legend(handles=scatter.legend_elements()[0], labels=u_labels)
plt.show()
我得到以下聚类图