Python 中 SciPy 树状图的自定义簇颜色(link_color_func?)
Custom cluster colors of SciPy dendrogram in Python (link_color_func?)
我想用我以字典形式制作的彩色图(即 {leaf: color}
)为我的聚类着色。
我试过按照 https://joernhees.de/blog/2015/08/26/scipy-hierarchical-clustering-and-dendrogram-tutorial/ 进行操作,但由于某种原因颜色弄乱了。默认图看起来不错,我只是想以不同方式分配这些颜色。我看到有一个 link_color_func
,但是当我尝试使用我的颜色映射(D_leaf_color
字典)时,我得到了一个错误 b/c 它不是一个函数。我创建了 D_leaf_color
来自定义与特定簇相关联的叶子的颜色。在我的实际数据集中,颜色具有某种意义,因此我避免了任意颜色分配。
我不想在我的实际数据中使用 color_threshold
b/c,我有更多的簇并且 SciPy
重复颜色,因此这个问题。 . .
如何使用我的叶色字典自定义树状图簇的颜色?
我提出了一个 GitHub 问题 https://github.com/scipy/scipy/issues/6346 where I further elaborated on the approach to color the leaves in 但我仍然无法弄清楚如何实际要么:(i) 使用树状图输出用我指定的颜色字典重建我的树状图或 (ii ) 为 link_color_func
参数重新格式化我的 D_leaf_color
字典。
# Init
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns; sns.set()
# Load data
from sklearn.datasets import load_diabetes
# Clustering
from scipy.cluster.hierarchy import dendrogram, fcluster, leaves_list
from scipy.spatial import distance
from fastcluster import linkage # You can use SciPy one too
%matplotlib inline
# Dataset
A_data = load_diabetes().data
DF_diabetes = pd.DataFrame(A_data, columns = ["attr_%d" % j for j in range(A_data.shape[1])])
# Absolute value of correlation matrix, then subtract from 1 for disimilarity
DF_dism = 1 - np.abs(DF_diabetes.corr())
# Compute average linkage
A_dist = distance.squareform(DF_dism.as_matrix())
Z = linkage(A_dist,method="average")
# Color mapping
D_leaf_colors = {"attr_1": "#808080", # Unclustered gray
"attr_4": "#B061FF", # Cluster 1 indigo
"attr_5": "#B061FF",
"attr_2": "#B061FF",
"attr_8": "#B061FF",
"attr_6": "#B061FF",
"attr_7": "#B061FF",
"attr_0": "#61ffff", # Cluster 2 cyan
"attr_3": "#61ffff",
"attr_9": "#61ffff",
}
# Dendrogram
# To get this dendrogram coloring below `color_threshold=0.7`
D = dendrogram(Z=Z, labels=DF_dism.index, color_threshold=None, leaf_font_size=12, leaf_rotation=45, link_color_func=D_leaf_colors)
# TypeError: 'dict' object is not callable
我也试过了how do I get the subtrees of dendrogram made by scipy.cluster.hierarchy
我找到了一个 hackish 解决方案,并且确实需要使用颜色阈值(但我需要使用它以获得相同的原始着色,否则颜色与 OP 中呈现的颜色不同),但是可以引导您找到解决方案。但是,您可能没有足够的信息来了解如何设置调色板顺序。
# Init
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns; sns.set()
# Load data
from sklearn.datasets import load_diabetes
# Clustering
from scipy.cluster.hierarchy import dendrogram, fcluster, leaves_list, set_link_color_palette
from scipy.spatial import distance
from fastcluster import linkage # You can use SciPy one too
%matplotlib inline
# Dataset
A_data = load_diabetes().data
DF_diabetes = pd.DataFrame(A_data, columns = ["attr_%d" % j for j in range(A_data.shape[1])])
# Absolute value of correlation matrix, then subtract from 1 for disimilarity
DF_dism = 1 - np.abs(DF_diabetes.corr())
# Compute average linkage
A_dist = distance.squareform(DF_dism.as_matrix())
Z = linkage(A_dist,method="average")
# Color mapping dict not relevant in this case
# Dendrogram
# To get this dendrogram coloring below `color_threshold=0.7`
#Change the color palette, I did not include the grey, which is used above the threshold
set_link_color_palette(["#B061FF", "#61ffff"])
D = dendrogram(Z=Z, labels=DF_dism.index, color_threshold=.7, leaf_font_size=12, leaf_rotation=45,
above_threshold_color="grey")
结果:
这里有一个解决方案,它使用 linkage()
的 return 矩阵 Z
(描述较早,但在 docs 中有点隐藏)和 link_color_func
:
# see question for code prior to "color mapping"
# Color mapping
dflt_col = "#808080" # Unclustered gray
D_leaf_colors = {"attr_1": dflt_col,
"attr_4": "#B061FF", # Cluster 1 indigo
"attr_5": "#B061FF",
"attr_2": "#B061FF",
"attr_8": "#B061FF",
"attr_6": "#B061FF",
"attr_7": "#B061FF",
"attr_0": "#61ffff", # Cluster 2 cyan
"attr_3": "#61ffff",
"attr_9": "#61ffff",
}
# notes:
# * rows in Z correspond to "inverted U" links that connect clusters
# * rows are ordered by increasing distance
# * if the colors of the connected clusters match, use that color for link
link_cols = {}
for i, i12 in enumerate(Z[:,:2].astype(int)):
c1, c2 = (link_cols[x] if x > len(Z) else D_leaf_colors["attr_%d"%x]
for x in i12)
link_cols[i+1+len(Z)] = c1 if c1 == c2 else dflt_col
# Dendrogram
D = dendrogram(Z=Z, labels=DF_dism.index, color_threshold=None,
leaf_font_size=12, leaf_rotation=45, link_color_func=lambda x: link_cols[x])
这里是输出:
Two-liner 用于将自定义颜色映射应用于集群分支:
import matplotlib as mpl
from matplotlib.pyplot import cm
from scipy.cluster import hierarchy
cmap = cm.rainbow(np.linspace(0, 1, 10))
hierarchy.set_link_color_palette([mpl.colors.rgb2hex(rgb[:3]) for rgb in cmap])
然后您可以用任何 cmap 替换 rainbow 并将 10 更改为您想要的集群数。
我想用我以字典形式制作的彩色图(即 {leaf: color}
)为我的聚类着色。
我试过按照 https://joernhees.de/blog/2015/08/26/scipy-hierarchical-clustering-and-dendrogram-tutorial/ 进行操作,但由于某种原因颜色弄乱了。默认图看起来不错,我只是想以不同方式分配这些颜色。我看到有一个 link_color_func
,但是当我尝试使用我的颜色映射(D_leaf_color
字典)时,我得到了一个错误 b/c 它不是一个函数。我创建了 D_leaf_color
来自定义与特定簇相关联的叶子的颜色。在我的实际数据集中,颜色具有某种意义,因此我避免了任意颜色分配。
我不想在我的实际数据中使用 color_threshold
b/c,我有更多的簇并且 SciPy
重复颜色,因此这个问题。 . .
如何使用我的叶色字典自定义树状图簇的颜色?
我提出了一个 GitHub 问题 https://github.com/scipy/scipy/issues/6346 where I further elaborated on the approach to color the leaves in link_color_func
参数重新格式化我的 D_leaf_color
字典。
# Init
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns; sns.set()
# Load data
from sklearn.datasets import load_diabetes
# Clustering
from scipy.cluster.hierarchy import dendrogram, fcluster, leaves_list
from scipy.spatial import distance
from fastcluster import linkage # You can use SciPy one too
%matplotlib inline
# Dataset
A_data = load_diabetes().data
DF_diabetes = pd.DataFrame(A_data, columns = ["attr_%d" % j for j in range(A_data.shape[1])])
# Absolute value of correlation matrix, then subtract from 1 for disimilarity
DF_dism = 1 - np.abs(DF_diabetes.corr())
# Compute average linkage
A_dist = distance.squareform(DF_dism.as_matrix())
Z = linkage(A_dist,method="average")
# Color mapping
D_leaf_colors = {"attr_1": "#808080", # Unclustered gray
"attr_4": "#B061FF", # Cluster 1 indigo
"attr_5": "#B061FF",
"attr_2": "#B061FF",
"attr_8": "#B061FF",
"attr_6": "#B061FF",
"attr_7": "#B061FF",
"attr_0": "#61ffff", # Cluster 2 cyan
"attr_3": "#61ffff",
"attr_9": "#61ffff",
}
# Dendrogram
# To get this dendrogram coloring below `color_threshold=0.7`
D = dendrogram(Z=Z, labels=DF_dism.index, color_threshold=None, leaf_font_size=12, leaf_rotation=45, link_color_func=D_leaf_colors)
# TypeError: 'dict' object is not callable
我也试过了how do I get the subtrees of dendrogram made by scipy.cluster.hierarchy
我找到了一个 hackish 解决方案,并且确实需要使用颜色阈值(但我需要使用它以获得相同的原始着色,否则颜色与 OP 中呈现的颜色不同),但是可以引导您找到解决方案。但是,您可能没有足够的信息来了解如何设置调色板顺序。
# Init
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns; sns.set()
# Load data
from sklearn.datasets import load_diabetes
# Clustering
from scipy.cluster.hierarchy import dendrogram, fcluster, leaves_list, set_link_color_palette
from scipy.spatial import distance
from fastcluster import linkage # You can use SciPy one too
%matplotlib inline
# Dataset
A_data = load_diabetes().data
DF_diabetes = pd.DataFrame(A_data, columns = ["attr_%d" % j for j in range(A_data.shape[1])])
# Absolute value of correlation matrix, then subtract from 1 for disimilarity
DF_dism = 1 - np.abs(DF_diabetes.corr())
# Compute average linkage
A_dist = distance.squareform(DF_dism.as_matrix())
Z = linkage(A_dist,method="average")
# Color mapping dict not relevant in this case
# Dendrogram
# To get this dendrogram coloring below `color_threshold=0.7`
#Change the color palette, I did not include the grey, which is used above the threshold
set_link_color_palette(["#B061FF", "#61ffff"])
D = dendrogram(Z=Z, labels=DF_dism.index, color_threshold=.7, leaf_font_size=12, leaf_rotation=45,
above_threshold_color="grey")
结果:
这里有一个解决方案,它使用 linkage()
的 return 矩阵 Z
(描述较早,但在 docs 中有点隐藏)和 link_color_func
:
# see question for code prior to "color mapping"
# Color mapping
dflt_col = "#808080" # Unclustered gray
D_leaf_colors = {"attr_1": dflt_col,
"attr_4": "#B061FF", # Cluster 1 indigo
"attr_5": "#B061FF",
"attr_2": "#B061FF",
"attr_8": "#B061FF",
"attr_6": "#B061FF",
"attr_7": "#B061FF",
"attr_0": "#61ffff", # Cluster 2 cyan
"attr_3": "#61ffff",
"attr_9": "#61ffff",
}
# notes:
# * rows in Z correspond to "inverted U" links that connect clusters
# * rows are ordered by increasing distance
# * if the colors of the connected clusters match, use that color for link
link_cols = {}
for i, i12 in enumerate(Z[:,:2].astype(int)):
c1, c2 = (link_cols[x] if x > len(Z) else D_leaf_colors["attr_%d"%x]
for x in i12)
link_cols[i+1+len(Z)] = c1 if c1 == c2 else dflt_col
# Dendrogram
D = dendrogram(Z=Z, labels=DF_dism.index, color_threshold=None,
leaf_font_size=12, leaf_rotation=45, link_color_func=lambda x: link_cols[x])
这里是输出:
Two-liner 用于将自定义颜色映射应用于集群分支:
import matplotlib as mpl
from matplotlib.pyplot import cm
from scipy.cluster import hierarchy
cmap = cm.rainbow(np.linspace(0, 1, 10))
hierarchy.set_link_color_palette([mpl.colors.rgb2hex(rgb[:3]) for rgb in cmap])
然后您可以用任何 cmap 替换 rainbow 并将 10 更改为您想要的集群数。