使用轴上的两个分类值和第三列的平均值作为值创建热图矩阵

Question

我有一个包含 3 列的数据集，如下例所示（实际数据集有 30K 行）：

age group	heightGroup	weight
4-5	60-70	50
5-6	70-80	52
4-5	70-80	50
5-6	70-80	57
6-7	60-70	54
4-5	50-60	50
5-6	70-80	43

我正在尝试创建热图 Y 轴是年龄组，X 轴是身高组作为分类值每个热图块的值将是该热图块的平均权重如何在 python 可视化该矩阵？提前致谢

Answer 1

您可以创建一个 pivot_table 聚合权重的平均值。如果需要，可以将身高和年龄分类以固定特定顺序。

import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
from io import StringIO

data_str = '''"age group" heightGroup weight
4-5 60-70   50
5-6 70-80   52
4-5 70-80   50
5-6 70-80   57
6-7 60-70   54
4-5 50-60   50
5-6 70-80   43'''
df = pd.read_csv(StringIO(data_str), delim_whitespace=True)
df_pivoted = df.pivot_table("weight", "age group", "heightGroup", aggfunc='mean')
ax = sns.heatmap(data=df_pivoted, annot=True, fmt='.1f')
plt.show()

PS：屏蔽掉所有计数为 1（或 0）的单元格：

df_pivoted_count = df.pivot_table("weight", "age group", "heightGroup", aggfunc='count').fillna(0)
ax = sns.heatmap(data=df_pivoted, mask=df_pivoted_count <= 1, annot=True, fmt='.1f')

显示带有着色计数的热图：计数数据框（不带 .fillna()）可用于 data=，方法用于 annot=。下面的代码还更改了颜色条刻度，以防止在此示例中显示非整数刻度。

df_pivoted = df.pivot_table("weight", "age group", "heightGroup", aggfunc='mean')
df_pivoted_count = df.pivot_table("weight", "age group", "heightGroup", aggfunc='count')
ax = sns.heatmap(data=df_pivoted_count, annot=df_pivoted, fmt='.1f', cmap='flare',
                 linecolor='skyblue', lw='2', clip_on=False, square=True,
                 cbar_kws={'ticks': range(1, int(df_pivoted_count.max().max()+1))})

使用轴上的两个分类值和第三列的平均值作为值创建热图矩阵

Creating a Heatmap Matrix using two categorical values at axis and average of third column as value

python

heatmap

seaborn