kmeans 的肘法
Elbow Method for kmeans
我正在处理聚类任务,我使用 Elbow Method 来获得最佳聚类数 (k),但我得到了一个线性图,我无法从图中确定 k .
[在此处输入图片描述][2]
谢谢
enter image description here
我建议你使用 silhouette score 来确定聚类的数量,它不需要你看图并且可以全自动 - 只需尝试不同的 k 值和 select 具有最小值的那个剪影得分:
https://scikit-learn.org/stable/auto_examples/cluster/plot_kmeans_silhouette_analysis.html
但是,在这种特定情况下,这看起来并不能解决您的问题。
如果数据点非常均匀地分布在 space 上,这意味着它们实际上没有形成任何集群,那么就不会有最佳 k 值。
以此处的最后一行为例:
https://scikit-learn.org/stable/modules/clustering.html
k 意味着在技术上确实创建了不同的集群,但它们并没有真正像您希望的那样彼此分开。
在这种情况下,将没有最小轮廓分数,肘部方法将不起作用。这可能就是您的情况,数据中没有真正的集群...
There are many ways to do this kind of thing. For one thing, you can use Yellowbrick to do the work.
import pandas as pd
import matplotlib as mpl
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
from sklearn.cluster import KMeans
from sklearn.datasets import make_blobs
from sklearn import datasets
from yellowbrick.cluster import KElbowVisualizer, SilhouetteVisualizer
mpl.rcParams["figure.figsize"] = (9,6)
# Load iris flower dataset
iris = datasets.load_iris()
X = iris.data #clustering is unsupervised learning hence we load only X(i.e.iris.data) and not Y(i.e. iris.target)
# Converting the data into dataframe
feature_names = iris.feature_names
iris_dataframe = pd.DataFrame(X, columns=feature_names)
iris_dataframe.head(10)
# Fitting the model with a dummy model, with 3 clusters (we already know there are 3 classes in the Iris dataset)
k_means = KMeans(n_clusters=3)
k_means.fit(X)
# Plotting a 3d plot using matplotlib to visualize the data points
fig = plt.figure(figsize=(7,7))
ax = fig.add_subplot(111, projection='3d')
# Setting the colors to match cluster results
colors = ['red' if label == 0 else 'purple' if label==1 else 'green' for label in k_means.labels_]
ax.scatter(X[:,3], X[:,0], X[:,2], c=colors)
# Instantiate the clustering model and visualizer
model = KMeans()
visualizer = KElbowVisualizer(model, k=(2,11))
visualizer.fit(X) # Fit the data to the visualizer
visualizer.show() # Draw/show/show the data
请参阅下面的链接了解更多信息。
我正在处理聚类任务,我使用 Elbow Method 来获得最佳聚类数 (k),但我得到了一个线性图,我无法从图中确定 k . [在此处输入图片描述][2]
谢谢
enter image description here
我建议你使用 silhouette score 来确定聚类的数量,它不需要你看图并且可以全自动 - 只需尝试不同的 k 值和 select 具有最小值的那个剪影得分:
https://scikit-learn.org/stable/auto_examples/cluster/plot_kmeans_silhouette_analysis.html
但是,在这种特定情况下,这看起来并不能解决您的问题。 如果数据点非常均匀地分布在 space 上,这意味着它们实际上没有形成任何集群,那么就不会有最佳 k 值。 以此处的最后一行为例:
https://scikit-learn.org/stable/modules/clustering.html
k 意味着在技术上确实创建了不同的集群,但它们并没有真正像您希望的那样彼此分开。 在这种情况下,将没有最小轮廓分数,肘部方法将不起作用。这可能就是您的情况,数据中没有真正的集群...
There are many ways to do this kind of thing. For one thing, you can use Yellowbrick to do the work.
import pandas as pd
import matplotlib as mpl
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
from sklearn.cluster import KMeans
from sklearn.datasets import make_blobs
from sklearn import datasets
from yellowbrick.cluster import KElbowVisualizer, SilhouetteVisualizer
mpl.rcParams["figure.figsize"] = (9,6)
# Load iris flower dataset
iris = datasets.load_iris()
X = iris.data #clustering is unsupervised learning hence we load only X(i.e.iris.data) and not Y(i.e. iris.target)
# Converting the data into dataframe
feature_names = iris.feature_names
iris_dataframe = pd.DataFrame(X, columns=feature_names)
iris_dataframe.head(10)
# Fitting the model with a dummy model, with 3 clusters (we already know there are 3 classes in the Iris dataset)
k_means = KMeans(n_clusters=3)
k_means.fit(X)
# Plotting a 3d plot using matplotlib to visualize the data points
fig = plt.figure(figsize=(7,7))
ax = fig.add_subplot(111, projection='3d')
# Setting the colors to match cluster results
colors = ['red' if label == 0 else 'purple' if label==1 else 'green' for label in k_means.labels_]
ax.scatter(X[:,3], X[:,0], X[:,2], c=colors)
# Instantiate the clustering model and visualizer
model = KMeans()
visualizer = KElbowVisualizer(model, k=(2,11))
visualizer.fit(X) # Fit the data to the visualizer
visualizer.show() # Draw/show/show the data
请参阅下面的链接了解更多信息。