如何绘制离每个质心 KMeans 最远的 n 个点
How To Plot n Furthest Points From Each Centroid KMeans
我正在尝试在 Python 中的 iris 数据集上训练 kmeans 模型。
有没有办法在 Python 中使用 kmeans 绘制距离每个质心最远的 n 个点?
这是一个完整的工作代码:
from sklearn import datasets
from sklearn.cluster import KMeans
import numpy as np
# import iris dataset
iris = datasets.load_iris()
X = iris.data[:, 2:5] # use two variables
# plot the two variables to check number of clusters
import matplotlib.pyplot as plt
plt.scatter(X[:, 0], X[:, 1])
# kmeans
km = KMeans(n_clusters = 2, random_state = 0) # Chose two clusters
y_pred = km.fit_predict(X)
X_dist = kmeans.transform(X) # get distances to each centroid
## Stuck at this point: How to make a function that extracts three points that are furthest from the two centroids
max3IdxArr = []
for label in np.unique(km.labels_):
X_label_indices = np.where(y_pred == label)[0]
# max3Idx = X_label_indices[np.argsort(X_dist[:3])] # This part is wrong
max3Idx = X_label_indices[np.argsort(X_dist[:3])] # This part is wrong
max3IdxArr.append(max3Idx)
max3IdxArr
# plot
plt.scatter(X[:, 0].iloc[max3IdxArr], X[:, 1].iloc[max3IdxArr])
你所做的是np.argsort(X_dist[:3])
它已经从未排序的 X_dist
中获取了前三个值,因此您可以
尝试服用 x=np.argsort(x_dist)
和
排序完成后,您可以尝试
x[:3]
欢迎提问,
如果这不起作用
干杯
我正在尝试在 Python 中的 iris 数据集上训练 kmeans 模型。
有没有办法在 Python 中使用 kmeans 绘制距离每个质心最远的 n 个点?
这是一个完整的工作代码:
from sklearn import datasets
from sklearn.cluster import KMeans
import numpy as np
# import iris dataset
iris = datasets.load_iris()
X = iris.data[:, 2:5] # use two variables
# plot the two variables to check number of clusters
import matplotlib.pyplot as plt
plt.scatter(X[:, 0], X[:, 1])
# kmeans
km = KMeans(n_clusters = 2, random_state = 0) # Chose two clusters
y_pred = km.fit_predict(X)
X_dist = kmeans.transform(X) # get distances to each centroid
## Stuck at this point: How to make a function that extracts three points that are furthest from the two centroids
max3IdxArr = []
for label in np.unique(km.labels_):
X_label_indices = np.where(y_pred == label)[0]
# max3Idx = X_label_indices[np.argsort(X_dist[:3])] # This part is wrong
max3Idx = X_label_indices[np.argsort(X_dist[:3])] # This part is wrong
max3IdxArr.append(max3Idx)
max3IdxArr
# plot
plt.scatter(X[:, 0].iloc[max3IdxArr], X[:, 1].iloc[max3IdxArr])
你所做的是np.argsort(X_dist[:3])
它已经从未排序的 X_dist
中获取了前三个值,因此您可以
尝试服用 x=np.argsort(x_dist)
和
排序完成后,您可以尝试
x[:3]
欢迎提问, 如果这不起作用
干杯