将 pandas groupby 数据框转换为热图
Convert pandas groupby dataframe into heatmap
我正试图找到一种很好的方法来可视化来自公开可用的 cBioPortal 突变数据的数据。我想绘制蛋白质变化的共现图(所以基本上,对于每个样本 ID,该特定样本是否也有任何其他突变)。见下图:
我想将其绘制成热图(示例如下):
我已经设法将数据转换成上面第一张图片的形式,但我完全不知道如何从那里转到示例热图。
我调查过:
df.groupby(['Sample ID', 'Protein Change', 'Cancer Type Detailed']).count().unstack('Protein Change'))
这似乎是朝着正确的方向发展,但并不完全正确。
基本上我想要的是一个在两个轴上都有 蛋白质变化 的热图,以及它们在单个样本中共存的次数。
如有任何帮助,我们将不胜感激。谢谢!!
这对你有用吗?
from itertools import combinations
# create a dataframe with all combinations for each Sample ID
df_combinations = df.groupby('Sample ID').agg({'Protein Change': lambda s: list(combinations(s, 2))})
# transform the obtained dataset to a two column format
df_combinations = pd.DataFrame(df_combinations['Protein Change'].explode().dropna().tolist(), columns=['Protein Change A', 'Protein Change B'])
# count how many times each combination appears
df_counts = df_combinations.groupby(['Protein Change A', 'Protein Change B']).size().to_frame('count').reset_index()
# pivot the df to obtain the desired matrix
df_counts.pivot(index='Protein Change A', columns='Protein Change B', values='count').fillna(0)
你可以这样做:
示例数据:
df = pd.DataFrame({'Sample ID': [1, 1, 1, 4, 4, 5, 6, 6],
'Protein Change': ['A', 'B', 'C', 'D', 'A', 'C', 'A', 'B'],
'Cancer Type Detailed': 'Some type'})
Sample ID Protein Change Cancer Type Detailed
0 1 A Some type
1 1 B Some type
2 1 C Some type
3 4 D Some type
4 4 A Some type
5 5 C Some type
6 6 A Some type
7 6 B Some type
代码:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
# Build co-occurrence matrix and set diagonal to zero.
ct = pd.crosstab(df['Sample ID'], df['Protein Change'])
co_occurrence = ct.T.dot(ct)
np.fill_diagonal(co_occurrence.to_numpy(), 0)
f, ax = plt.subplots(figsize=(4, 5))
# Mask lower triangular for plotting.
mask = np.tril(np.ones_like(co_occurrence))
cmap = sns.light_palette("seagreen", as_cmap=True)
sns.heatmap(co_occurrence, mask=mask, cmap=cmap, square=True, cbar_kws={"shrink": .65})
plt.show()
结果:
我正试图找到一种很好的方法来可视化来自公开可用的 cBioPortal 突变数据的数据。我想绘制蛋白质变化的共现图(所以基本上,对于每个样本 ID,该特定样本是否也有任何其他突变)。见下图:
我想将其绘制成热图(示例如下):
我已经设法将数据转换成上面第一张图片的形式,但我完全不知道如何从那里转到示例热图。
我调查过:
df.groupby(['Sample ID', 'Protein Change', 'Cancer Type Detailed']).count().unstack('Protein Change'))
这似乎是朝着正确的方向发展,但并不完全正确。
基本上我想要的是一个在两个轴上都有 蛋白质变化 的热图,以及它们在单个样本中共存的次数。
如有任何帮助,我们将不胜感激。谢谢!!
这对你有用吗?
from itertools import combinations
# create a dataframe with all combinations for each Sample ID
df_combinations = df.groupby('Sample ID').agg({'Protein Change': lambda s: list(combinations(s, 2))})
# transform the obtained dataset to a two column format
df_combinations = pd.DataFrame(df_combinations['Protein Change'].explode().dropna().tolist(), columns=['Protein Change A', 'Protein Change B'])
# count how many times each combination appears
df_counts = df_combinations.groupby(['Protein Change A', 'Protein Change B']).size().to_frame('count').reset_index()
# pivot the df to obtain the desired matrix
df_counts.pivot(index='Protein Change A', columns='Protein Change B', values='count').fillna(0)
你可以这样做:
示例数据:
df = pd.DataFrame({'Sample ID': [1, 1, 1, 4, 4, 5, 6, 6],
'Protein Change': ['A', 'B', 'C', 'D', 'A', 'C', 'A', 'B'],
'Cancer Type Detailed': 'Some type'})
Sample ID Protein Change Cancer Type Detailed
0 1 A Some type
1 1 B Some type
2 1 C Some type
3 4 D Some type
4 4 A Some type
5 5 C Some type
6 6 A Some type
7 6 B Some type
代码:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
# Build co-occurrence matrix and set diagonal to zero.
ct = pd.crosstab(df['Sample ID'], df['Protein Change'])
co_occurrence = ct.T.dot(ct)
np.fill_diagonal(co_occurrence.to_numpy(), 0)
f, ax = plt.subplots(figsize=(4, 5))
# Mask lower triangular for plotting.
mask = np.tril(np.ones_like(co_occurrence))
cmap = sns.light_palette("seagreen", as_cmap=True)
sns.heatmap(co_occurrence, mask=mask, cmap=cmap, square=True, cbar_kws={"shrink": .65})
plt.show()
结果: