如何使用肘法在k-medoids中选择k值?
how to choose k value in k-medoids using elbow method?
我正在尝试这段代码:
https://gist.github.com/jaganadhg/9a25fb531df47beb13e3
import pylab as plt
import numpy as np
from scipy.spatial.distance import cdist, pdist
from sklearn.cluster import KMeans
from sklearn.datasets import load_iris
iris = load_iris()
k = range(1,11)
clusters = [KMeans(n_clusters = c,init = 'k-means++').fit(iris.data) for c in k]
centr_lst = [cc.cluster_centers_ for cc in clusters]
k_distance = [cdist(iris.data, cent, 'euclidean') for cent in centr_lst]
clust_indx = [np.argmin(kd,axis=1) for kd in k_distance]
distances = [np.min(kd,axis=1) for kd in k_distance]
avg_within = [np.sum(dist)/iris.data.shape[0] for dist in distances]
with_in_sum_square = [np.sum(dist ** 2) for dist in distances]
to_sum_square = np.sum(pdist(iris.data) ** 2)/iris.data.shape[0]
bet_sum_square = to_sum_square - with_in_sum_square
kidx = 2
fig = plt.figure()
ax = fig.add_subplot(111)
ax.plot(k, avg_within, 'g*-')
ax.plot(k[kidx], avg_within[kidx], marker='o', markersize=12, \
markeredgewidth=2, markeredgecolor='r', markerfacecolor='None')
plt.grid(True)
plt.xlabel('Number of clusters')
plt.ylabel('Average within-cluster sum of squares')
plt.title('Elbow for KMeans clustering (IRIS Data)')
并且对 k-means 没有问题。
但是当我将 k-means 更改为 k-medoids
clusters = [KMedoids(n_clusters = c).fit(iris.data) for c in k]
(我使用 pyclust k-medoids https://github.com/mirjalil/pyclust/blob/master/pyclust/_kmedoids.py)
我得到了"None"数组
[None, None, None, None, None, None, None]
怎么了?
有人可以帮忙吗?
因为您下载但不理解的代码的 fit
方法没有 return 值。
添加一个 return self
或 return self._clusters
或类似于 fit
的方法,它的行为可能符合预期,至少对于这一步。
我正在尝试这段代码: https://gist.github.com/jaganadhg/9a25fb531df47beb13e3
import pylab as plt
import numpy as np
from scipy.spatial.distance import cdist, pdist
from sklearn.cluster import KMeans
from sklearn.datasets import load_iris
iris = load_iris()
k = range(1,11)
clusters = [KMeans(n_clusters = c,init = 'k-means++').fit(iris.data) for c in k]
centr_lst = [cc.cluster_centers_ for cc in clusters]
k_distance = [cdist(iris.data, cent, 'euclidean') for cent in centr_lst]
clust_indx = [np.argmin(kd,axis=1) for kd in k_distance]
distances = [np.min(kd,axis=1) for kd in k_distance]
avg_within = [np.sum(dist)/iris.data.shape[0] for dist in distances]
with_in_sum_square = [np.sum(dist ** 2) for dist in distances]
to_sum_square = np.sum(pdist(iris.data) ** 2)/iris.data.shape[0]
bet_sum_square = to_sum_square - with_in_sum_square
kidx = 2
fig = plt.figure()
ax = fig.add_subplot(111)
ax.plot(k, avg_within, 'g*-')
ax.plot(k[kidx], avg_within[kidx], marker='o', markersize=12, \
markeredgewidth=2, markeredgecolor='r', markerfacecolor='None')
plt.grid(True)
plt.xlabel('Number of clusters')
plt.ylabel('Average within-cluster sum of squares')
plt.title('Elbow for KMeans clustering (IRIS Data)')
并且对 k-means 没有问题。 但是当我将 k-means 更改为 k-medoids
clusters = [KMedoids(n_clusters = c).fit(iris.data) for c in k]
(我使用 pyclust k-medoids https://github.com/mirjalil/pyclust/blob/master/pyclust/_kmedoids.py)
我得到了"None"数组
[None, None, None, None, None, None, None]
怎么了? 有人可以帮忙吗?
因为您下载但不理解的代码的 fit
方法没有 return 值。
添加一个 return self
或 return self._clusters
或类似于 fit
的方法,它的行为可能符合预期,至少对于这一步。