将 pandas groupby 数据框转换为热图

Question

我正试图找到一种很好的方法来可视化来自公开可用的 cBioPortal 突变数据的数据。我想绘制蛋白质变化的共现图（所以基本上，对于每个样本 ID，该特定样本是否也有任何其他突变）。见下图：

我想将其绘制成热图（示例如下）：

我已经设法将数据转换成上面第一张图片的形式，但我完全不知道如何从那里转到示例热图。

我调查过：

df.groupby(['Sample ID', 'Protein Change', 'Cancer Type Detailed']).count().unstack('Protein Change'))

这似乎是朝着正确的方向发展，但并不完全正确。

基本上我想要的是一个在两个轴上都有 蛋白质变化 的热图，以及它们在单个样本中共存的次数。

如有任何帮助，我们将不胜感激。谢谢！！

Answer 1

这对你有用吗？

from itertools import combinations
# create a dataframe with all combinations for each Sample ID
df_combinations = df.groupby('Sample ID').agg({'Protein Change': lambda s: list(combinations(s, 2))})
# transform the obtained dataset to a two column format
df_combinations = pd.DataFrame(df_combinations['Protein Change'].explode().dropna().tolist(), columns=['Protein Change A', 'Protein Change B'])
# count how many times each combination appears
df_counts = df_combinations.groupby(['Protein Change A', 'Protein Change B']).size().to_frame('count').reset_index()
# pivot the df to obtain the desired matrix
df_counts.pivot(index='Protein Change A', columns='Protein Change B', values='count').fillna(0)

Answer 2

你可以这样做：

示例数据：

df = pd.DataFrame({'Sample ID': [1, 1, 1, 4, 4, 5, 6, 6],
                   'Protein Change': ['A', 'B', 'C', 'D', 'A', 'C', 'A', 'B'],
                   'Cancer Type Detailed': 'Some type'})

   Sample ID Protein Change Cancer Type Detailed
0          1              A            Some type
1          1              B            Some type
2          1              C            Some type
3          4              D            Some type
4          4              A            Some type
5          5              C            Some type
6          6              A            Some type
7          6              B            Some type

代码：

import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

# Build co-occurrence matrix and set diagonal to zero.
ct = pd.crosstab(df['Sample ID'], df['Protein Change'])
co_occurrence = ct.T.dot(ct)
np.fill_diagonal(co_occurrence.to_numpy(), 0)

f, ax = plt.subplots(figsize=(4, 5))

# Mask lower triangular for plotting.
mask = np.tril(np.ones_like(co_occurrence))

cmap = sns.light_palette("seagreen", as_cmap=True)
sns.heatmap(co_occurrence, mask=mask, cmap=cmap, square=True, cbar_kws={"shrink": .65})
plt.show()

结果：

将 pandas groupby 数据框转换为热图

Convert pandas groupby dataframe into heatmap

python

heatmap

pandas

plotly

pandas-groupby