我们是否应该在 KElbowVisualizer 方法之前进行缩放以在 python 中进行聚类
Should we scale before the KElbowVisualizer method for clustering in python
我知道在任何聚类之前我们需要缩放数据。
但是我想问一下 KElbowVisualizer 方法是自己进行缩放还是在给它数据之前我应该缩放它。
我已经在该方法的文档中进行了搜索,但没有找到答案,如果您找到了,能否与我分享。谢谢;
我在 github 查看了 yellowbrick/cluster/elbow.py
中 KElbowVisualizer
的实现,但我没有在函数 fit
(line 306
) 下找到任何代码用于缩放 X
变量。
# https://github.com/DistrictDataLabs/yellowbrick/blob/main/yellowbrick/cluster/elbow.py
#...
def fit(self, X, y=None, **kwargs):
"""
Fits n KMeans models where n is the length of ``self.k_values_``,
storing the silhouette scores in the ``self.k_scores_`` attribute.
The "elbow" and silhouette score corresponding to it are stored in
``self.elbow_value`` and ``self.elbow_score`` respectively.
This method finishes up by calling draw to create the plot.
"""
self.k_scores_ = []
self.k_timers_ = []
self.kneedle = None
self.knee_value = None
if self.locate_elbow:
self.elbow_value_ = None
self.elbow_score_ = None
for k in self.k_values_:
# Compute the start time for each model
start = time.time()
# Set the k value and fit the model
self.estimator.set_params(n_clusters=k)
self.estimator.fit(X, **kwargs)
# Append the time and score to our plottable metrics
self.k_timers_.append(time.time() - start)
self.k_scores_.append(self.scoring_metric(X, self.estimator.labels_))
#...
因此,在传递给 KElbowVisualizer().fit()
之前,您可能需要缩放数据(X
参数)
我知道在任何聚类之前我们需要缩放数据。
但是我想问一下 KElbowVisualizer 方法是自己进行缩放还是在给它数据之前我应该缩放它。
我已经在该方法的文档中进行了搜索,但没有找到答案,如果您找到了,能否与我分享。谢谢;
我在 github 查看了 yellowbrick/cluster/elbow.py
中 KElbowVisualizer
的实现,但我没有在函数 fit
(line 306
) 下找到任何代码用于缩放 X
变量。
# https://github.com/DistrictDataLabs/yellowbrick/blob/main/yellowbrick/cluster/elbow.py
#...
def fit(self, X, y=None, **kwargs):
"""
Fits n KMeans models where n is the length of ``self.k_values_``,
storing the silhouette scores in the ``self.k_scores_`` attribute.
The "elbow" and silhouette score corresponding to it are stored in
``self.elbow_value`` and ``self.elbow_score`` respectively.
This method finishes up by calling draw to create the plot.
"""
self.k_scores_ = []
self.k_timers_ = []
self.kneedle = None
self.knee_value = None
if self.locate_elbow:
self.elbow_value_ = None
self.elbow_score_ = None
for k in self.k_values_:
# Compute the start time for each model
start = time.time()
# Set the k value and fit the model
self.estimator.set_params(n_clusters=k)
self.estimator.fit(X, **kwargs)
# Append the time and score to our plottable metrics
self.k_timers_.append(time.time() - start)
self.k_scores_.append(self.scoring_metric(X, self.estimator.labels_))
#...
因此,在传递给 KElbowVisualizer().fit()
X
参数)