如何在 seaborn 计数图上添加组内百分比作为条形标签?

How do I add within-group percentages as bar labels on a seaborn count plot?

我的目标是创建在 y 轴上带有计数的条形图,并用组的百分比标记条形图。下面的代码让我走到了一半 -

import seaborn as sns
from itertools import product
titanic = sns.load_dataset("titanic")

features = ['sex', 'class', 'who', 'adult_male']
n = 1

plt.figure(figsize=[12, 14])

for f in features:
    plt.subplot(3, 2, n)
    ax = sns.countplot(x=f, hue='survived', edgecolor='black', alpha=0.8, data=titanic)
    sns.despine()
    plt.title("Countplot of {}  by alive".format(f))
    n=n+1
    plt.tight_layout()
    
    for c in ax.containers:
        labels = [f'{h/titanic.survived.count()*100:0.1f}%' if (h := v.get_height()) > 0 else '' for v in c]
        ax.bar_label(c,labels=labels, label_type='edge')

问题是百分比不正确。例如,在“幸存者性别计数图”图表中,男性百分比计算的是整个数据集中“0”class 中男性的百分比。

如何调整我的代码以计算“0”class 男性类别中男性的百分比?所以男性类别中的蓝色条应该是 81%,橙色条应该是 19%。

手动生成 within-feature 比例,例如对于特征 sex:

  1. 使用groupby.value_counts()
  2. 计算每个sexsurvived的比例
  3. 通过其组 (male/female sex) 和标签 (0/1 survived) 访问给定条的比例
    • 组将根据 dtype 以不同的方式排序,因此 unique() 并不总是有效(请参阅下一节中的完整示例)
    • 容器c的标签是c.get_label(),可以通过df[hue].dtype.type
    • 转换为合适的类型
df = sns.load_dataset('titanic')

feat = 'sex'
hue = 'survived'
hue_type = df[hue].dtype.type

groups = df[feat].unique()
proportions = df.groupby(feat)[hue].value_counts(normalize=True)
# sex     survived
# female  1           0.742038
#         0           0.257962
# male    0           0.811092
#         1           0.188908
# Name: survived, dtype: float64

ax = sns.countplot(x=feat, hue=hue, data=df)

for c in ax.containers:
    labels = [f'{proportions.loc[g, hue_type(c.get_label())]:.1%}' for g in groups]
    # proportions.loc['male', 0] => 0.811092
    # proportions.loc['male', 1] => 0.188908
    # proportions.loc['female', 0] => 0.257962
    # proportions.loc['female', 1] => 0.742038

    ax.bar_label(c, labels)


具有所有功能的完整示例:

titanic = sns.load_dataset('titanic')

features = ['sex', 'class', 'who', 'adult_male']
hue = 'survived'
hue_type = df[hue].dtype.type

fig, axs = plt.subplots(2, 2, figsize=(10, 10), constrained_layout=True)

for feat, ax in zip(features, axs.ravel()):
    # group ordering differs by dtype
    col = titanic[feat]
    if col.dtype == 'category':
        groups = col.cat.categories
    elif col.dtype == 'bool':
        groups = [False, True]
    else:
        groups = col.unique()

    # within-feature proportions
    proportions = titanic.groupby(feat)[hue].value_counts(normalize=True)
    
    sns.countplot(x=feat, hue=hue, edgecolor='k', alpha=0.8, data=titanic, ax=ax)
    ax.set_title(f'Countplot of ${feat}$ by ${hue}$')
    sns.despine()

    # retrieve proportions by the container's label (hue) and group (feature)
    for c in ax.containers:
        labels = [f'{proportions.loc[g, hue_type(c.get_label())]:.1%}' for g in groups]
        ax.bar_label(c, labels)