如何为 pandas 列的每组创建一个子图

How to create a subplot for each group of a pandas column

在泰坦尼克号数据集中,我需要创建一个图表来显示所有 class 幸存乘客的百分比。它还应该有三个饼图。 class 1 名幸存者和未幸存者,class 2 名幸存者和未幸存者,class 3.

如何才能做到这一点?我已经尝试过这种类型的代码,但它产生了错误的值。

import pandas as pd
import seaborn as sns  # for dataset

df_titanic = sns.load_dataset('titanic')

   survived  pclass     sex   age  sibsp  parch     fare embarked  class    who  adult_male deck  embark_town alive  alone
0         0       3    male  22.0      1      0   7.2500        S  Third    man        True  NaN  Southampton    no  False
1         1       1  female  38.0      1      0  71.2833        C  First  woman       False    C    Cherbourg   yes  False
2         1       3  female  26.0      0      0   7.9250        S  Third  woman       False  NaN  Southampton   yes   True

c1s = len(df_titanic[(df_titanic.pclass==1) & (df_titanic.survived==1)].value_counts())
c2ns = len(df_titanic[(df_titanic.pclass==1) & (df_titanic.survived==0)].value_counts())

此代码生成真实值,但我需要在 3 个饼图中使用它

df_titanic.groupby(['pclass' ,'survived']).size().plot(kind='pie', autopct='%.2f')

class: 1,2,3 幸存: 0,1

代码:

labels = ["not survived", "survived"]
fig, axs = plt.subplots(1,3)
axs[0].pie(df_titanic[df_titanic["Pclass"] == 1].groupby(["Survived"]).size(), labels=labels, autopct='%1.1f%%')
axs[1].pie(df_titanic[df_titanic["Pclass"] == 2].groupby(["Survived"]).size(), labels=labels, autopct='%1.1f%%')
axs[2].pie(df_titanic[df_titanic["Pclass"] == 3].groupby(["Survived"]).size(), labels=labels, autopct='%1.1f%%')
plt.show()

结果:

  1. 使用 pandas 获取子图的正确方法是重塑数据框。 pandas.crosstab 用于塑造数据框
  2. 然后使用 pandas.DataFrame.plotkind='pie'subplots=True 绘图。
  • 为格式化添加了额外的代码
    • 旋转 pclass 标签
    • 剧情标题
    • 自定义图例,而不是每个子图的图例
      • 为图例指定标签
      • 为标签数量指定颜色
  • 测试于 python 3.8.12pandas 1.3.4matplotlib 3.4.3
import seaborn as sns  # for titanic data only
import pandas as pd
from matplotlib.patches import Patch  # to create the colored squares for the legend

# load the dataframe
df = sns.load_dataset('titanic')

# reshaping the dataframe is the most important step
ct = pd.crosstab(df.survived, df.pclass)

# display(ct)
pclass      1   2    3
survived              
0          80  97  372
1         136  87  119

# plot and add labels
colors = ['tab:blue', 'tab:orange']  # specify the colors so they can be used in the legend
labels = ["not survived", "survived"]  # used for the legend
axes = ct.plot(kind='pie', autopct='%.1f%%', subplots=True, figsize=(12, 5),
               legend=False, labels=['', ''], colors=colors)

# flatten the array of axes
axes = axes.flat

# extract the figure object
fig = axes[0].get_figure()

# rotate the pclass label
for ax in axes:
    yl = ax.get_ylabel()
    ax.set_ylabel(yl, rotation=0, fontsize=12)
    
# create the legend
legend_elements = [Patch(fc=c, label=l) for c, l in zip(colors, labels)]
fig.legend(handles=legend_elements, loc=9, fontsize=12, ncol=2, borderaxespad=0, bbox_to_anchor=(0., 0.8, 1, .102), frameon=False)

fig.tight_layout()
fig.suptitle('pclass survival', fontsize=15)

格式化图

未格式化的图

axes = ct.plot(kind='pie', autopct='%.1f%%', subplots=True, figsize=(12, 5), labels=["not survived", "survived"])