如何使用 Python 中的两个独立尺度将两个数据集聚类成一个热图？

Question

我正在尝试使用 cluster heatmap function in Seaborn.

问题是这两个数据集来自两个不同的过程，因此它们包含不同分布的值（我的意思是，第一个数据集的值范围从 0 到 1，但第二个数据集从 1000 到到 5000）。

我的问题是：

如何聚类两个具有不同值范围的数据集？有没有办法将数据集的行聚集到一个热图中，并且每个数据集可能有两个尺度？

到目前为止我已经尝试过，但收效甚微：

#First, I have combined the two datasets into one dataframe object:
dataset = pd.concat([dataset_1, dataset_2], axis=0)

#Then, passed the dataframe into Seaborn's `.clustermap()` function:
sns.clustermap(data=dataset, 
    col_cluster=False)

输出： 你可以注意到 dataset_1 的特征都被遮挡了，因为数据集之间的规模差异（dataset_1 和 dataset_2 如下所示)

知道如何解决这个问题吗？

Answer 1

您可以使用 sklearn 的预处理库，特别是在创建 clustermap 之前的缩放器。

文档在这里：http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.scale.html#sklearn.preprocessing.scale

如何使用 Python 中的两个独立尺度将两个数据集聚类成一个热图？

How to cluster two datasets into a single heatmap using two separate scales in Python?

python

hierarchical-clustering

matplotlib

heatmap

seaborn