k-均值算法不起作用
k-means algorithm not working
我正在尝试使用 Numpy 在 Python 3 中实现 k-means 算法。我的输入数据矩阵是一个简单的 n x 2 点数据矩阵:
[[1, 2],
[3, 4],
...
[7, 13]]
出于某种原因,在迭代的每个步骤中,我的 none 个标签都是相同的。每一个标签都是不同的。有没有人看到我正在做的任何明显错误?我尝试在我的代码中添加一些注释,以便人们可以理解我正在执行的各个步骤。
def kmeans(X,k):
# Initialize by choosing k random data points as centroids
num_features = X.shape[1]
centroids = X[np.random.randint(X.shape[0], size=k), :] # find k centroids
iterations = 0
old_labels, labels = [], []
while not should_stop(old_labels, labels, iterations):
iterations += 1
clusters = [[] for i in range(0,k)]
for i in range(k):
clusters[i].append(centroids[i])
# Label points
old_labels = labels
labels = []
for point in X:
distances = [np.linalg.norm(point-centroid) for centroid in centroids]
max_centroid = np.argmax(distances)
labels.append(max_centroid)
clusters[max_centroid].append(point)
# Compute new centroids
centroids = np.empty(shape=(0,num_features))
for cluster in clusters:
avgs = sum(cluster)/len(cluster)
centroids = np.append(centroids, [avgs], axis=0)
return labels
def should_stop(old_labels, labels, iterations):
count = 0
if len(old_labels) == 0:
return False
for i in range(len(labels)):
count += (old_labels[i] != labels[i])
print(count)
if old_labels == labels or iterations == 2000:
return True
return False
max_centroid = np.argmax(distances)
您想找到最小化距离的质心,而不是最大化它的质心。
我正在尝试使用 Numpy 在 Python 3 中实现 k-means 算法。我的输入数据矩阵是一个简单的 n x 2 点数据矩阵:
[[1, 2],
[3, 4],
...
[7, 13]]
出于某种原因,在迭代的每个步骤中,我的 none 个标签都是相同的。每一个标签都是不同的。有没有人看到我正在做的任何明显错误?我尝试在我的代码中添加一些注释,以便人们可以理解我正在执行的各个步骤。
def kmeans(X,k):
# Initialize by choosing k random data points as centroids
num_features = X.shape[1]
centroids = X[np.random.randint(X.shape[0], size=k), :] # find k centroids
iterations = 0
old_labels, labels = [], []
while not should_stop(old_labels, labels, iterations):
iterations += 1
clusters = [[] for i in range(0,k)]
for i in range(k):
clusters[i].append(centroids[i])
# Label points
old_labels = labels
labels = []
for point in X:
distances = [np.linalg.norm(point-centroid) for centroid in centroids]
max_centroid = np.argmax(distances)
labels.append(max_centroid)
clusters[max_centroid].append(point)
# Compute new centroids
centroids = np.empty(shape=(0,num_features))
for cluster in clusters:
avgs = sum(cluster)/len(cluster)
centroids = np.append(centroids, [avgs], axis=0)
return labels
def should_stop(old_labels, labels, iterations):
count = 0
if len(old_labels) == 0:
return False
for i in range(len(labels)):
count += (old_labels[i] != labels[i])
print(count)
if old_labels == labels or iterations == 2000:
return True
return False
max_centroid = np.argmax(distances)
您想找到最小化距离的质心,而不是最大化它的质心。