树状图按组着色
Dendogram Coloring by groups
我使用 seaborn clustermap 创建了一个基于 spearman 相关矩阵的 heatmap,如下所示:我想绘制树状图。我希望树状图看起来像这样:
dendrogram
但在热图上
我创建了一个颜色字典,如下所示,但出现错误:
def assign_tree_colour(name,val_dict,coding_names_df):
ret = None
if val_dict.get(name, '') == 'Group 1':
ret = "(0,0.9,0.4)" #green
elif val_dict.get(name, '') == 'Group 2':
ret = "(0.6,0.1,0)" #red
elif val_dict.get(name, '') == 'Group 3':
ret = "(0.3,0.8,1)" #light blue
elif val_dict.get(name, '') == 'Group 4':
ret = "(0.4,0.1,1)" #purple
elif val_dict.get(name, '') == 'Group 5':
ret = "(1,0.9,0.1)" #yellow
elif val_dict.get(name, '') == 'Group 6':
ret = "(0,0,0)" #black
else:
ret = "(0,0,0)" #black
return ret
def fix_string(str):
return str.replace('"', '')
external_data3 = [list(z) for z in coding_names_df.values]
external_data3 = {fix_string(z[0]): z[3] for z in external_data3}
tree_label = list(df.index)
tree_label = [fix_string(x) for x in tree_label]
tree_labels = { j : tree_label[j] for j in range(0, len(tree_label) ) }
tree_colour = [assign_tree_colour(label, external_data3, coding_names_df) for label in tree_labels]
tree_colors = { i : tree_colour[i] for i in range(0, len(tree_colour) ) }
sns.set(color_codes=True)
sns.set(font_scale=1)
g = sns.clustermap(df, cmap="bwr",
vmin=-1, vmax=1,
yticklabels=1, xticklabels=1,
cbar_kws={"ticks":[-1,-0.5,0,0.5,1]},
figsize=(13,13),
row_colors=row_colors,
col_colors=col_colors,
method='average',
metric='correlation',
tree_kws=dict(colors=tree_colors))
g.ax_heatmap.set_xlabel('Genus')
g.ax_heatmap.set_ylabel('Genus')
for label in Group.unique():
g.ax_col_dendrogram.bar(0, 0, color=lut[label],
label=label, linewidth=0)
g.ax_col_dendrogram.legend(loc=9, ncol=7, bbox_to_anchor=(0.26, 0., 0.5, 1.5))
ax=g.ax_heatmap
File "<ipython-input-64-4bc6be89afe3>", line 11, in <module>
tree_kws=dict(colors=tree_colors))
File "C:\Users\rotemb\AppData\Local\Continuum\anaconda3\lib\site-packages\seaborn\matrix.py", line 1391, in clustermap
tree_kws=tree_kws, **kwargs)
File "C:\Users\rotemb\AppData\Local\Continuum\anaconda3\lib\site-packages\seaborn\matrix.py", line 1208, in plot
tree_kws=tree_kws)
File "C:\Users\rotemb\AppData\Local\Continuum\anaconda3\lib\site-packages\seaborn\matrix.py", line 1054, in plot_dendrograms
tree_kws=tree_kws
File "C:\Users\rotemb\AppData\Local\Continuum\anaconda3\lib\site-packages\seaborn\matrix.py", line 776, in dendrogram
return plotter.plot(ax=ax, tree_kws=tree_kws)
File "C:\Users\rotemb\AppData\Local\Continuum\anaconda3\lib\site-packages\seaborn\matrix.py", line 692, in plot
**tree_kws)
File "C:\Users\rotemb\AppData\Local\Continuum\anaconda3\lib\site-packages\matplotlib\collections.py", line 1316, in __init__
colors = mcolors.to_rgba_array(colors)
File "C:\Users\rotemb\AppData\Local\Continuum\anaconda3\lib\site-packages\matplotlib\colors.py", line 294, in to_rgba_array
result[i] = to_rgba(cc, alpha)
File "C:\Users\rotemb\AppData\Local\Continuum\anaconda3\lib\site-packages\matplotlib\colors.py", line 177, in to_rgba
rgba = _to_rgba_no_colorcycle(c, alpha)
File "C:\Users\rotemb\AppData\Local\Continuum\anaconda3\lib\site-packages\matplotlib\colors.py", line 240, in _to_rgba_no_colorcycle
raise ValueError("Invalid RGBA argument: {!r}".format(orig_c))
ValueError: Invalid RGBA argument: 0
如有任何帮助,我们将不胜感激!
谢谢!
根据 sns.clustermap
文档,树状图着色可以通过 tree_kws
(接受字典)及其 colors
属性来设置,该属性需要一个 RGB 元组列表,例如 (0.5, 0.5, 1)
。 colors
似乎也只支持 RGB 元组格式数据。
您是否注意到 clustermap
支持树状图和相关矩阵之间分层颜色条的嵌套列表或数据框?如果树状图太拥挤,它们可能会有用。
希望对您有所帮助!
编辑
RGB 列表是 LineCollection
中线条颜色的 序列 — 它在 both[= 中绘制每条线时使用该序列42=] 树状图。 (顺序好像是从列树状图最右边的分支开始的顺序)为了将某个标签与数据点关联起来,需要弄清楚数据点在树状图中的绘制顺序。
编辑二
这是根据 sns.clustermap
个示例为树着色的最小示例:
import matplotlib.pyplot as plt
import seaborn as sns; sns.set(color_codes=True)
import pandas as pd
iris = sns.load_dataset("iris")
species = iris.pop("species")
g = sns.clustermap(iris)
lut = dict(zip(species.unique(), "rbg"))
row_colors = species.map(lut)
# For demonstrating the hierarchical sidebar coloring
df_colors = pd.DataFrame(data={'r': row_colors[row_colors == 'r'], 'g': row_colors[row_colors == 'g'], 'b': row_colors[row_colors == 'b']})
# Simple class RGBA colormap
colmap = {'setosa': (1, 0, 0, 0.7), 'virginica': (0, 1, 0, 0.7), 'versicolor': (0, 0, 1, 0.7)}
g = sns.clustermap(iris, row_colors=df_colors, tree_kws={'colors':[colmap[s] for s in species]})
plt.savefig('clustermap.png')
如您所见,树的绘制线的顺序从图像的右上角开始,因此与 clustermap 中可视化的数据点的顺序无关。另一方面,彩条(由 {row,col}_colors
属性控制)可用于该目的。
基于上面的答案,下面是对主要三个分支进行不同着色的示例,蛮力(前 49 行为红色,接下来的 35 行为绿色,最后 62 行为蓝色,其余两行为黑色):
import matplotlib.pyplot as plt
import seaborn as sns; sns.set(color_codes=True)
import pandas as pd
iris = sns.load_dataset("iris")
species = iris.pop("species")
g = sns.clustermap(iris)
lut = dict(zip(species.unique(), "rbg"))
row_colors = species.map(lut)
# For demonstrating the hierarchical sidebar coloring
df_colors = pd.DataFrame(data={'r': row_colors[row_colors == 'r'], 'g': row_colors[row_colors == 'g'], 'b': row_colors[row_colors == 'b']})
# Simple class RGBA colormap
colmap = {'setosa': (1, 0, 0, 0.7), 'virginica': (0, 1, 0, 0.7), 'versicolor': (0, 0, 1, 0.7)}
g = sns.clustermap(iris, row_colors=df_colors, tree_kws={'colors':[(1,0,0,1)]*49+[(0,1,0,1)]*35+[(0,0,1,1)]*63+[(0,0,0,1)]*2})
plt.savefig('clustermap.png')
对于一般情况,可以从树状图(此处描述 scipy linkage format)中得出要着色的线数:
# The number of leaves is always the number of merges + 1
# (if we have 2 leaves we do 1 merge)
n_leaves = len(g.dendrogram_row.linkage)+1
# The last merge on the array is naturally the one that joins
# the last two broad clusters together
n0_ndx = len(g.dendrogram_row.linkage) - 1
# At index [n0_ndx] of the linkage array, positions [0] and [1],
# we have the "indexes" of the two clusters that were merged.
# However, in order to find the actual index of these two
# clusters in the linkage array, we must subtract from this
# position (cluster/element number) the total number of leaves,
# because the cluster number listed here starts at 0 with the
# individual elements given to the function; and these elements
# are not themselves part of the linkage array.
# So linkage[0] has cluster number equal to n_leaves; and conversely,
# to calculate the index of a cluster in the linkage array,
# we must subtract the value of n_leaves from the cluster number.
n1_ndx = int(g.dendrogram_row.linkage[n0_ndx][0])-n_leaves
n2_ndx = int(g.dendrogram_row.linkage[n0_ndx][1])-n_leaves
# Similarly we can find the array index of clusters further down
n21_ndx = int(g.dendrogram_row.linkage[n2_ndx][0])-n_leaves
n22_ndx = int(g.dendrogram_row.linkage[n2_ndx][1])-n_leaves
# And finally, having identified the array index of the clusters
# that we are interested in coloring, we can determine the number
# of members in each cluster, which is stored in position [3]
# of each element of the array
n1 = int(g.dendrogram_row.linkage[n1_ndx][3])-1
n21 = int(g.dendrogram_row.linkage[n21_ndx][3])-1
n22 = int(g.dendrogram_row.linkage[n22_ndx][3])-1
# So we can finally color, with RGBa tuples, an amount of elements
# equal to the number of elements in each cluster of interest.
g = sns.clustermap(iris, row_colors=df_colors, tree_kws={'colors':[(1,0,0,1)]*n1+[(0,1,0,1)]*n21+[(0,0,1,1)]*n22+[(0,0,0,1)]*(n_leave\
s-1-n1-n21-n22)})
虽然,我还没有想出一种方法来为顶部树状图着色不同...
我使用 seaborn clustermap 创建了一个基于 spearman 相关矩阵的 heatmap,如下所示:我想绘制树状图。我希望树状图看起来像这样: dendrogram 但在热图上
我创建了一个颜色字典,如下所示,但出现错误:
def assign_tree_colour(name,val_dict,coding_names_df):
ret = None
if val_dict.get(name, '') == 'Group 1':
ret = "(0,0.9,0.4)" #green
elif val_dict.get(name, '') == 'Group 2':
ret = "(0.6,0.1,0)" #red
elif val_dict.get(name, '') == 'Group 3':
ret = "(0.3,0.8,1)" #light blue
elif val_dict.get(name, '') == 'Group 4':
ret = "(0.4,0.1,1)" #purple
elif val_dict.get(name, '') == 'Group 5':
ret = "(1,0.9,0.1)" #yellow
elif val_dict.get(name, '') == 'Group 6':
ret = "(0,0,0)" #black
else:
ret = "(0,0,0)" #black
return ret
def fix_string(str):
return str.replace('"', '')
external_data3 = [list(z) for z in coding_names_df.values]
external_data3 = {fix_string(z[0]): z[3] for z in external_data3}
tree_label = list(df.index)
tree_label = [fix_string(x) for x in tree_label]
tree_labels = { j : tree_label[j] for j in range(0, len(tree_label) ) }
tree_colour = [assign_tree_colour(label, external_data3, coding_names_df) for label in tree_labels]
tree_colors = { i : tree_colour[i] for i in range(0, len(tree_colour) ) }
sns.set(color_codes=True)
sns.set(font_scale=1)
g = sns.clustermap(df, cmap="bwr",
vmin=-1, vmax=1,
yticklabels=1, xticklabels=1,
cbar_kws={"ticks":[-1,-0.5,0,0.5,1]},
figsize=(13,13),
row_colors=row_colors,
col_colors=col_colors,
method='average',
metric='correlation',
tree_kws=dict(colors=tree_colors))
g.ax_heatmap.set_xlabel('Genus')
g.ax_heatmap.set_ylabel('Genus')
for label in Group.unique():
g.ax_col_dendrogram.bar(0, 0, color=lut[label],
label=label, linewidth=0)
g.ax_col_dendrogram.legend(loc=9, ncol=7, bbox_to_anchor=(0.26, 0., 0.5, 1.5))
ax=g.ax_heatmap
File "<ipython-input-64-4bc6be89afe3>", line 11, in <module>
tree_kws=dict(colors=tree_colors))
File "C:\Users\rotemb\AppData\Local\Continuum\anaconda3\lib\site-packages\seaborn\matrix.py", line 1391, in clustermap
tree_kws=tree_kws, **kwargs)
File "C:\Users\rotemb\AppData\Local\Continuum\anaconda3\lib\site-packages\seaborn\matrix.py", line 1208, in plot
tree_kws=tree_kws)
File "C:\Users\rotemb\AppData\Local\Continuum\anaconda3\lib\site-packages\seaborn\matrix.py", line 1054, in plot_dendrograms
tree_kws=tree_kws
File "C:\Users\rotemb\AppData\Local\Continuum\anaconda3\lib\site-packages\seaborn\matrix.py", line 776, in dendrogram
return plotter.plot(ax=ax, tree_kws=tree_kws)
File "C:\Users\rotemb\AppData\Local\Continuum\anaconda3\lib\site-packages\seaborn\matrix.py", line 692, in plot
**tree_kws)
File "C:\Users\rotemb\AppData\Local\Continuum\anaconda3\lib\site-packages\matplotlib\collections.py", line 1316, in __init__
colors = mcolors.to_rgba_array(colors)
File "C:\Users\rotemb\AppData\Local\Continuum\anaconda3\lib\site-packages\matplotlib\colors.py", line 294, in to_rgba_array
result[i] = to_rgba(cc, alpha)
File "C:\Users\rotemb\AppData\Local\Continuum\anaconda3\lib\site-packages\matplotlib\colors.py", line 177, in to_rgba
rgba = _to_rgba_no_colorcycle(c, alpha)
File "C:\Users\rotemb\AppData\Local\Continuum\anaconda3\lib\site-packages\matplotlib\colors.py", line 240, in _to_rgba_no_colorcycle
raise ValueError("Invalid RGBA argument: {!r}".format(orig_c))
ValueError: Invalid RGBA argument: 0
如有任何帮助,我们将不胜感激! 谢谢!
根据 sns.clustermap
文档,树状图着色可以通过 tree_kws
(接受字典)及其 colors
属性来设置,该属性需要一个 RGB 元组列表,例如 (0.5, 0.5, 1)
。 colors
似乎也只支持 RGB 元组格式数据。
您是否注意到 clustermap
支持树状图和相关矩阵之间分层颜色条的嵌套列表或数据框?如果树状图太拥挤,它们可能会有用。
希望对您有所帮助!
编辑
RGB 列表是 LineCollection
中线条颜色的 序列 — 它在 both[= 中绘制每条线时使用该序列42=] 树状图。 (顺序好像是从列树状图最右边的分支开始的顺序)为了将某个标签与数据点关联起来,需要弄清楚数据点在树状图中的绘制顺序。
编辑二
这是根据 sns.clustermap
个示例为树着色的最小示例:
import matplotlib.pyplot as plt
import seaborn as sns; sns.set(color_codes=True)
import pandas as pd
iris = sns.load_dataset("iris")
species = iris.pop("species")
g = sns.clustermap(iris)
lut = dict(zip(species.unique(), "rbg"))
row_colors = species.map(lut)
# For demonstrating the hierarchical sidebar coloring
df_colors = pd.DataFrame(data={'r': row_colors[row_colors == 'r'], 'g': row_colors[row_colors == 'g'], 'b': row_colors[row_colors == 'b']})
# Simple class RGBA colormap
colmap = {'setosa': (1, 0, 0, 0.7), 'virginica': (0, 1, 0, 0.7), 'versicolor': (0, 0, 1, 0.7)}
g = sns.clustermap(iris, row_colors=df_colors, tree_kws={'colors':[colmap[s] for s in species]})
plt.savefig('clustermap.png')
{row,col}_colors
属性控制)可用于该目的。
基于上面的答案,下面是对主要三个分支进行不同着色的示例,蛮力(前 49 行为红色,接下来的 35 行为绿色,最后 62 行为蓝色,其余两行为黑色):
import matplotlib.pyplot as plt
import seaborn as sns; sns.set(color_codes=True)
import pandas as pd
iris = sns.load_dataset("iris")
species = iris.pop("species")
g = sns.clustermap(iris)
lut = dict(zip(species.unique(), "rbg"))
row_colors = species.map(lut)
# For demonstrating the hierarchical sidebar coloring
df_colors = pd.DataFrame(data={'r': row_colors[row_colors == 'r'], 'g': row_colors[row_colors == 'g'], 'b': row_colors[row_colors == 'b']})
# Simple class RGBA colormap
colmap = {'setosa': (1, 0, 0, 0.7), 'virginica': (0, 1, 0, 0.7), 'versicolor': (0, 0, 1, 0.7)}
g = sns.clustermap(iris, row_colors=df_colors, tree_kws={'colors':[(1,0,0,1)]*49+[(0,1,0,1)]*35+[(0,0,1,1)]*63+[(0,0,0,1)]*2})
plt.savefig('clustermap.png')
对于一般情况,可以从树状图(此处描述 scipy linkage format)中得出要着色的线数:
# The number of leaves is always the number of merges + 1
# (if we have 2 leaves we do 1 merge)
n_leaves = len(g.dendrogram_row.linkage)+1
# The last merge on the array is naturally the one that joins
# the last two broad clusters together
n0_ndx = len(g.dendrogram_row.linkage) - 1
# At index [n0_ndx] of the linkage array, positions [0] and [1],
# we have the "indexes" of the two clusters that were merged.
# However, in order to find the actual index of these two
# clusters in the linkage array, we must subtract from this
# position (cluster/element number) the total number of leaves,
# because the cluster number listed here starts at 0 with the
# individual elements given to the function; and these elements
# are not themselves part of the linkage array.
# So linkage[0] has cluster number equal to n_leaves; and conversely,
# to calculate the index of a cluster in the linkage array,
# we must subtract the value of n_leaves from the cluster number.
n1_ndx = int(g.dendrogram_row.linkage[n0_ndx][0])-n_leaves
n2_ndx = int(g.dendrogram_row.linkage[n0_ndx][1])-n_leaves
# Similarly we can find the array index of clusters further down
n21_ndx = int(g.dendrogram_row.linkage[n2_ndx][0])-n_leaves
n22_ndx = int(g.dendrogram_row.linkage[n2_ndx][1])-n_leaves
# And finally, having identified the array index of the clusters
# that we are interested in coloring, we can determine the number
# of members in each cluster, which is stored in position [3]
# of each element of the array
n1 = int(g.dendrogram_row.linkage[n1_ndx][3])-1
n21 = int(g.dendrogram_row.linkage[n21_ndx][3])-1
n22 = int(g.dendrogram_row.linkage[n22_ndx][3])-1
# So we can finally color, with RGBa tuples, an amount of elements
# equal to the number of elements in each cluster of interest.
g = sns.clustermap(iris, row_colors=df_colors, tree_kws={'colors':[(1,0,0,1)]*n1+[(0,1,0,1)]*n21+[(0,0,1,1)]*n22+[(0,0,0,1)]*(n_leave\
s-1-n1-n21-n22)})
虽然,我还没有想出一种方法来为顶部树状图着色不同...