凝聚聚类层次结构可视化
Agglomerative Clustering Hierarchy Visualization
如何将凝聚聚类形成的层次结构可视化为树状图。我有一个大小为 (400,400) 的预计算距离矩阵。
clusterer= AgglomerativeClustering(n_clusters=32,affinity="precomputed",linkage="average").fit(distance_matrix)
如何将形成 32 个簇的结果清晰地可视化为树状图?我尝试可视化这些簇,但由于它们是 32 个,所以颜色无法清楚地区分它们。
colors_clusters = clusterer.labels_
fig = plt.figure(figsize=(10,7))
plt.xlabel('median_score', family='Arial', fontsize=9)
plt.ylabel('count_intersections', family='Arial', fontsize=9)
plt.title('Heliopolis', family='Arial', fontsize=12)
plt.scatter(clusters_df['median_score'], clusters_df['count_intersections'], c=colors_clusters, edgecolors='black', s=50)
plt.show()
著名的层次聚类可视化方法之一也在使用 dendrogram. You can find a plot example in sklearn library. You can find examples in scipy library。
您可以在此处找到前 link 的示例:
import numpy as np
from matplotlib import pyplot as plt
from scipy.cluster.hierarchy import dendrogram
from sklearn.datasets import load_iris
from sklearn.cluster import AgglomerativeClustering
def plot_dendrogram(model, **kwargs):
# Create linkage matrix and then plot the dendrogram
# create the counts of samples under each node
counts = np.zeros(model.children_.shape[0])
n_samples = len(model.labels_)
for i, merge in enumerate(model.children_):
current_count = 0
for child_idx in merge:
if child_idx < n_samples:
current_count += 1 # leaf node
else:
current_count += counts[child_idx - n_samples]
counts[i] = current_count
linkage_matrix = np.column_stack([model.children_, model.distances_,
counts]).astype(float)
# Plot the corresponding dendrogram
dendrogram(linkage_matrix, **kwargs)
iris = load_iris()
X = iris.data
# setting distance_threshold=0 ensures we compute the full tree.
model = AgglomerativeClustering(distance_threshold=0, n_clusters=None)
model = model.fit(X)
plt.title('Hierarchical Clustering Dendrogram')
# plot the top three levels of the dendrogram
plot_dendrogram(model, truncate_mode='level', p=3)
plt.xlabel("Number of points in node (or index of point if no parenthesis).")
plt.show()
如何将凝聚聚类形成的层次结构可视化为树状图。我有一个大小为 (400,400) 的预计算距离矩阵。
clusterer= AgglomerativeClustering(n_clusters=32,affinity="precomputed",linkage="average").fit(distance_matrix)
如何将形成 32 个簇的结果清晰地可视化为树状图?我尝试可视化这些簇,但由于它们是 32 个,所以颜色无法清楚地区分它们。
colors_clusters = clusterer.labels_
fig = plt.figure(figsize=(10,7))
plt.xlabel('median_score', family='Arial', fontsize=9)
plt.ylabel('count_intersections', family='Arial', fontsize=9)
plt.title('Heliopolis', family='Arial', fontsize=12)
plt.scatter(clusters_df['median_score'], clusters_df['count_intersections'], c=colors_clusters, edgecolors='black', s=50)
plt.show()
著名的层次聚类可视化方法之一也在使用 dendrogram. You can find a plot example in sklearn library. You can find examples in scipy library。
您可以在此处找到前 link 的示例:
import numpy as np
from matplotlib import pyplot as plt
from scipy.cluster.hierarchy import dendrogram
from sklearn.datasets import load_iris
from sklearn.cluster import AgglomerativeClustering
def plot_dendrogram(model, **kwargs):
# Create linkage matrix and then plot the dendrogram
# create the counts of samples under each node
counts = np.zeros(model.children_.shape[0])
n_samples = len(model.labels_)
for i, merge in enumerate(model.children_):
current_count = 0
for child_idx in merge:
if child_idx < n_samples:
current_count += 1 # leaf node
else:
current_count += counts[child_idx - n_samples]
counts[i] = current_count
linkage_matrix = np.column_stack([model.children_, model.distances_,
counts]).astype(float)
# Plot the corresponding dendrogram
dendrogram(linkage_matrix, **kwargs)
iris = load_iris()
X = iris.data
# setting distance_threshold=0 ensures we compute the full tree.
model = AgglomerativeClustering(distance_threshold=0, n_clusters=None)
model = model.fit(X)
plt.title('Hierarchical Clustering Dendrogram')
# plot the top three levels of the dendrogram
plot_dendrogram(model, truncate_mode='level', p=3)
plt.xlabel("Number of points in node (or index of point if no parenthesis).")
plt.show()