树状图按组着色

Dendogram Coloring by groups

我使用 seaborn clustermap 创建了一个基于 spearman 相关矩阵的 heatmap,如下所示:我想绘制树状图。我希望树状图看起来像这样: dendrogram 但在热图上

我创建了一个颜色字典,如下所示,但出现错误:

def assign_tree_colour(name,val_dict,coding_names_df):
ret = None
if val_dict.get(name, '') == 'Group 1':
    ret = "(0,0.9,0.4)"   #green
elif val_dict.get(name, '') == 'Group 2':
    ret = "(0.6,0.1,0)"   #red
elif val_dict.get(name, '') == 'Group 3':
    ret = "(0.3,0.8,1)"   #light blue
elif val_dict.get(name, '') == 'Group 4':
    ret = "(0.4,0.1,1)"   #purple
elif val_dict.get(name, '') == 'Group 5':
    ret = "(1,0.9,0.1)"   #yellow
elif val_dict.get(name, '') == 'Group 6':
    ret = "(0,0,0)"   #black
else:
    ret = "(0,0,0)"         #black
return ret

def fix_string(str):
    return str.replace('"', '')

external_data3 = [list(z) for z in coding_names_df.values]
external_data3 = {fix_string(z[0]): z[3] for z in external_data3}

tree_label = list(df.index)
tree_label = [fix_string(x) for x in tree_label]
tree_labels = { j : tree_label[j] for j in range(0, len(tree_label) ) }

tree_colour = [assign_tree_colour(label, external_data3, coding_names_df) for label in tree_labels]
tree_colors = { i : tree_colour[i] for i in range(0, len(tree_colour) ) }


sns.set(color_codes=True)
sns.set(font_scale=1)
g = sns.clustermap(df, cmap="bwr",
                   vmin=-1, vmax=1,
                   yticklabels=1, xticklabels=1,
                   cbar_kws={"ticks":[-1,-0.5,0,0.5,1]},
                   figsize=(13,13),
                   row_colors=row_colors,
                   col_colors=col_colors,
                   method='average',
                   metric='correlation',
                   tree_kws=dict(colors=tree_colors))
g.ax_heatmap.set_xlabel('Genus')
g.ax_heatmap.set_ylabel('Genus')
for label in Group.unique():
    g.ax_col_dendrogram.bar(0, 0, color=lut[label],
                            label=label, linewidth=0)
g.ax_col_dendrogram.legend(loc=9, ncol=7, bbox_to_anchor=(0.26, 0., 0.5, 1.5))
ax=g.ax_heatmap



 File "<ipython-input-64-4bc6be89afe3>", line 11, in <module>
tree_kws=dict(colors=tree_colors))



File "C:\Users\rotemb\AppData\Local\Continuum\anaconda3\lib\site-packages\seaborn\matrix.py", line 1391, in clustermap
    tree_kws=tree_kws, **kwargs)

  File "C:\Users\rotemb\AppData\Local\Continuum\anaconda3\lib\site-packages\seaborn\matrix.py", line 1208, in plot
    tree_kws=tree_kws)

  File "C:\Users\rotemb\AppData\Local\Continuum\anaconda3\lib\site-packages\seaborn\matrix.py", line 1054, in plot_dendrograms
    tree_kws=tree_kws

  File "C:\Users\rotemb\AppData\Local\Continuum\anaconda3\lib\site-packages\seaborn\matrix.py", line 776, in dendrogram
    return plotter.plot(ax=ax, tree_kws=tree_kws)

  File "C:\Users\rotemb\AppData\Local\Continuum\anaconda3\lib\site-packages\seaborn\matrix.py", line 692, in plot
    **tree_kws)

  File "C:\Users\rotemb\AppData\Local\Continuum\anaconda3\lib\site-packages\matplotlib\collections.py", line 1316, in __init__
    colors = mcolors.to_rgba_array(colors)

  File "C:\Users\rotemb\AppData\Local\Continuum\anaconda3\lib\site-packages\matplotlib\colors.py", line 294, in to_rgba_array
    result[i] = to_rgba(cc, alpha)

  File "C:\Users\rotemb\AppData\Local\Continuum\anaconda3\lib\site-packages\matplotlib\colors.py", line 177, in to_rgba
    rgba = _to_rgba_no_colorcycle(c, alpha)

  File "C:\Users\rotemb\AppData\Local\Continuum\anaconda3\lib\site-packages\matplotlib\colors.py", line 240, in _to_rgba_no_colorcycle
    raise ValueError("Invalid RGBA argument: {!r}".format(orig_c))

ValueError: Invalid RGBA argument: 0

如有任何帮助,我们将不胜感激! 谢谢!

根据 sns.clustermap 文档,树状图着色可以通过 tree_kws(接受字典)及其 colors 属性来设置,该属性需要一个 RGB 元组列表,例如 (0.5, 0.5, 1)colors 似乎也只支持 RGB 元组格式数据。

您是否注意到 clustermap 支持树状图和相关矩阵之间分层颜色条的嵌套列表或数据框?如果树状图太拥挤,它们可能会有用。

希望对您有所帮助!

编辑

RGB 列表是 LineCollection 中线条颜色的 序列 — 它在 both[= 中绘制每条线时使用该序列42=] 树状图。 (顺序好像是从列树状图最右边的分支开始的顺序)为了将某个标签与数据点关联起来,需要弄清楚数据点在树状图中的绘制顺序。

编辑二

这是根据 sns.clustermap 个示例为树着色的最小示例:

import matplotlib.pyplot as plt
import seaborn as sns; sns.set(color_codes=True)
import pandas as pd


iris = sns.load_dataset("iris")
species = iris.pop("species")
g = sns.clustermap(iris)
lut = dict(zip(species.unique(), "rbg"))
row_colors = species.map(lut)
# For demonstrating the hierarchical sidebar coloring
df_colors = pd.DataFrame(data={'r': row_colors[row_colors == 'r'], 'g': row_colors[row_colors == 'g'], 'b': row_colors[row_colors == 'b']}) 
# Simple class RGBA colormap
colmap = {'setosa': (1, 0, 0, 0.7), 'virginica': (0, 1, 0, 0.7), 'versicolor': (0, 0, 1, 0.7)}
g = sns.clustermap(iris, row_colors=df_colors, tree_kws={'colors':[colmap[s] for s in species]})
plt.savefig('clustermap.png')

如您所见,树的绘制线的顺序从图像的右上角开始,因此与 clustermap 中可视化的数据点的顺序无关。另一方面,彩条(由 {row,col}_colors 属性控制)可用于该目的。

基于上面的答案,下面是对主要三个分支进行不同着色的示例,蛮力(前 49 行为红色,接下来的 35 行为绿色,最后 62 行为蓝色,其余两行为黑色):

import matplotlib.pyplot as plt
import seaborn as sns; sns.set(color_codes=True)
import pandas as pd


iris = sns.load_dataset("iris")
species = iris.pop("species")
g = sns.clustermap(iris)
lut = dict(zip(species.unique(), "rbg"))
row_colors = species.map(lut)
# For demonstrating the hierarchical sidebar coloring
df_colors = pd.DataFrame(data={'r': row_colors[row_colors == 'r'], 'g': row_colors[row_colors == 'g'], 'b': row_colors[row_colors == 'b']}) 
# Simple class RGBA colormap
colmap = {'setosa': (1, 0, 0, 0.7), 'virginica': (0, 1, 0, 0.7), 'versicolor': (0, 0, 1, 0.7)}
g = sns.clustermap(iris, row_colors=df_colors, tree_kws={'colors':[(1,0,0,1)]*49+[(0,1,0,1)]*35+[(0,0,1,1)]*63+[(0,0,0,1)]*2})
plt.savefig('clustermap.png')

对于一般情况,可以从树状图(此处描述 scipy linkage format)中得出要着色的线数:

# The number of leaves is always the number of merges + 1 
# (if we have 2 leaves we do 1 merge)

n_leaves = len(g.dendrogram_row.linkage)+1

# The last merge on the array is naturally the one that joins
# the last two broad clusters together

n0_ndx = len(g.dendrogram_row.linkage) - 1

# At index [n0_ndx] of the linkage array, positions [0] and [1],
# we have the "indexes" of the two clusters that were merged.
# However, in order to find the actual index of these two
# clusters in the linkage array, we must subtract from this 
# position (cluster/element number) the total number of leaves, 
# because the cluster number listed here starts at 0 with the
# individual elements given to the function; and these elements
# are not themselves part of the linkage array.
# So linkage[0] has cluster number equal to n_leaves; and conversely,
# to calculate the index of a cluster in the linkage array,
# we must subtract the value of n_leaves from the cluster number.

n1_ndx = int(g.dendrogram_row.linkage[n0_ndx][0])-n_leaves
n2_ndx = int(g.dendrogram_row.linkage[n0_ndx][1])-n_leaves

# Similarly we can find the array index of clusters further down

n21_ndx = int(g.dendrogram_row.linkage[n2_ndx][0])-n_leaves
n22_ndx = int(g.dendrogram_row.linkage[n2_ndx][1])-n_leaves

# And finally, having identified the array index of the clusters
# that we are interested in coloring, we can determine the number
# of members in each cluster, which is stored in position [3]
# of each element of the array

n1 = int(g.dendrogram_row.linkage[n1_ndx][3])-1
n21 = int(g.dendrogram_row.linkage[n21_ndx][3])-1
n22 = int(g.dendrogram_row.linkage[n22_ndx][3])-1

# So we can finally color, with RGBa tuples, an amount of elements
# equal to the number of elements in each cluster of interest.  
  
g = sns.clustermap(iris, row_colors=df_colors, tree_kws={'colors':[(1,0,0,1)]*n1+[(0,1,0,1)]*n21+[(0,0,1,1)]*n22+[(0,0,0,1)]*(n_leave\
s-1-n1-n21-n22)})

虽然,我还没有想出一种方法来为顶部树状图着色不同...