将 pandas groupby 数据框转换为热图

Convert pandas groupby dataframe into heatmap

我正试图找到一种很好的方法来可视化来自公开可用的 cBioPortal 突变数据的数据。我想绘制蛋白质变化的共现图(所以基本上,对于每个样本 ID,该特定样本是否也有任何其他突变)。见下图:

我想将其绘制成热图(示例如下):

我已经设法将数据转换成上面第一张图片的形式,但我完全不知道如何从那里转到示例热图。

我调查过:

df.groupby(['Sample ID', 'Protein Change', 'Cancer Type Detailed']).count().unstack('Protein Change'))

这似乎是朝着正确的方向发展,但并不完全正确。

基本上我想要的是一个在两个轴上都有 蛋白质变化 的热图,以及它们在单个样本中共存的次数。

如有任何帮助,我们将不胜感激。谢谢!!

这对你有用吗?

from itertools import combinations
# create a dataframe with all combinations for each Sample ID
df_combinations = df.groupby('Sample ID').agg({'Protein Change': lambda s: list(combinations(s, 2))})
# transform the obtained dataset to a two column format
df_combinations = pd.DataFrame(df_combinations['Protein Change'].explode().dropna().tolist(), columns=['Protein Change A', 'Protein Change B'])
# count how many times each combination appears
df_counts = df_combinations.groupby(['Protein Change A', 'Protein Change B']).size().to_frame('count').reset_index()
# pivot the df to obtain the desired matrix
df_counts.pivot(index='Protein Change A', columns='Protein Change B', values='count').fillna(0)

你可以这样做:

示例数据:

df = pd.DataFrame({'Sample ID': [1, 1, 1, 4, 4, 5, 6, 6],
                   'Protein Change': ['A', 'B', 'C', 'D', 'A', 'C', 'A', 'B'],
                   'Cancer Type Detailed': 'Some type'})
   Sample ID Protein Change Cancer Type Detailed
0          1              A            Some type
1          1              B            Some type
2          1              C            Some type
3          4              D            Some type
4          4              A            Some type
5          5              C            Some type
6          6              A            Some type
7          6              B            Some type

代码:

import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

# Build co-occurrence matrix and set diagonal to zero.
ct = pd.crosstab(df['Sample ID'], df['Protein Change'])
co_occurrence = ct.T.dot(ct)
np.fill_diagonal(co_occurrence.to_numpy(), 0)

f, ax = plt.subplots(figsize=(4, 5))

# Mask lower triangular for plotting.
mask = np.tril(np.ones_like(co_occurrence))

cmap = sns.light_palette("seagreen", as_cmap=True)
sns.heatmap(co_occurrence, mask=mask, cmap=cmap, square=True, cbar_kws={"shrink": .65})
plt.show()

结果: