如何在 python 中创建堆叠条形图,按类别进行颜色编码
How to create stacked bar chart in python, color coded by category
我正在 Kaggle 上处理一个流行的泰坦尼克号数据集,我想创建一个条形图来显示按性别分类的幸存者人数与死者人数。在 x 轴上,我想要性别 (male/female)。我想让幸存者和死者堆叠起来并用颜色编码。
这是我当前的代码,它为 male/survived、male/deceased、female/survived、female/deceased:
的每个组合生成四个柱
import pandas as pd
import seaborn as sns # for the data
df = sns.load_dataset('titanic').loc[:, ['sex', 'survived']]
df.groupby('sex').survived.value_counts().plot(kind='bar', color=['C0', 'C1'], stacked=True)
当前输出
通过一些示例数据,我相信这就是您正在寻找的,使用 matplotlib
:
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame({'Sex':['M','F','M','F','M','F','M','F','M','F','F','F','M','F','F','F'],
'Survived':['Y','Y','N','Y','N','Y','N','Y','Y','Y','Y','Y','Y','Y','N','N']})
grouped = df.groupby(['Sex','Survived'],as_index=False).agg(Count=pd.NamedAgg(column="Survived", aggfunc="count"))
fig, ax = plt.subplots()
ax.bar(grouped[grouped['Sex'] =='F']['Survived'], grouped[grouped['Sex']=='F']['Count'],label='F')
ax.bar(grouped[grouped['Sex'] =='M']['Survived'], grouped[grouped['Sex']=='M']['Count'],label='M',bottom=grouped[grouped['Sex']=='F']['Count'])
ax.set_ylabel("Number of passengers")
ax.set_xlabel("Survived status")
ax.set_title('Passengers by survivality and gender')
ax.legend()
plt.show()
这是输出:
- 最简单的方法是使用
pandas.DataFrame.pivot_table
, and then plot with pandas.DataFrame.plot
指定 kind='bar'
和 stacked=True
重塑 DataFrame。
- 要记住的重要一点是将数据塑造成正确的绘图格式 API。
- 使用
pandas v1.2.4
和 matplotlib v3.3.4
(matplotlib
被 pandas
作为依赖导入)。
import seaborn as sns # used for the titanic data
import pandas as pd
# load the two necessary column
df = sns.load_dataset('titanic').loc[:, ['sex', 'survived']]
# create a pivot table
dfp = df.pivot_table(index='sex', columns=['survived'], aggfunc=len)
# display(dfp)
survived 0 1
sex
female 81 233
male 468 109
# plot the dataframe
dfp.plot(kind='bar', stacked=True, ylabel='Counts', xlabel='Gender',
title='Survival Status Count by Gender', rot=0)
- 我不推荐堆叠条形图,因为它更难区分和比较每个类别的值。
dfp.plot(kind='bar', stacked=False, ylabel='Counts', xlabel='Gender',
title='Survival Status Count by Gender', rot=0)
对于这种精心制作的图,尤其是 DataFrame,我更喜欢使用 plotly,因为输出更具交互性。我没有使用组操作,因为逻辑索引可以解决问题。
最后,由于堆叠直方图的性质,你只能代表死者,因为幸存者将是左边的区域。如果你无论如何都想代表他们(用不同的颜色),请随时发表评论。
希望它能解决您的问题!
import plotly.graph_objects as go
import numpy as np
import seaborn as sns
df = sns.load_dataset('titanic').loc[:, ['sex', 'survived']]
male_df = df[df['sex'] == 'male']
female_df = df[df['sex'] == 'female']
fig = go.Figure(go.Histogram(
x=df['sex'], bingroup=1, name='total number of male/female'
))
fig.add_trace(go.Histogram(
x=male_df[male_df['survived'] == 0]['sex'], bingroup=1, name='number of deceased male'
))
fig.add_trace(go.Histogram(
x=female_df[female_df['survived'] == 0]['sex'], bingroup=1, name='number of deceased female'
))
fig.update_layout(
title='Passengers by survivality and gender',
barmode='overlay',
bargap=0.1
)
fig.show()
我正在 Kaggle 上处理一个流行的泰坦尼克号数据集,我想创建一个条形图来显示按性别分类的幸存者人数与死者人数。在 x 轴上,我想要性别 (male/female)。我想让幸存者和死者堆叠起来并用颜色编码。
这是我当前的代码,它为 male/survived、male/deceased、female/survived、female/deceased:
的每个组合生成四个柱import pandas as pd
import seaborn as sns # for the data
df = sns.load_dataset('titanic').loc[:, ['sex', 'survived']]
df.groupby('sex').survived.value_counts().plot(kind='bar', color=['C0', 'C1'], stacked=True)
当前输出
通过一些示例数据,我相信这就是您正在寻找的,使用 matplotlib
:
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame({'Sex':['M','F','M','F','M','F','M','F','M','F','F','F','M','F','F','F'],
'Survived':['Y','Y','N','Y','N','Y','N','Y','Y','Y','Y','Y','Y','Y','N','N']})
grouped = df.groupby(['Sex','Survived'],as_index=False).agg(Count=pd.NamedAgg(column="Survived", aggfunc="count"))
fig, ax = plt.subplots()
ax.bar(grouped[grouped['Sex'] =='F']['Survived'], grouped[grouped['Sex']=='F']['Count'],label='F')
ax.bar(grouped[grouped['Sex'] =='M']['Survived'], grouped[grouped['Sex']=='M']['Count'],label='M',bottom=grouped[grouped['Sex']=='F']['Count'])
ax.set_ylabel("Number of passengers")
ax.set_xlabel("Survived status")
ax.set_title('Passengers by survivality and gender')
ax.legend()
plt.show()
这是输出:
- 最简单的方法是使用
pandas.DataFrame.pivot_table
, and then plot withpandas.DataFrame.plot
指定kind='bar'
和stacked=True
重塑 DataFrame。- 要记住的重要一点是将数据塑造成正确的绘图格式 API。
- 使用
pandas v1.2.4
和matplotlib v3.3.4
(matplotlib
被pandas
作为依赖导入)。
import seaborn as sns # used for the titanic data
import pandas as pd
# load the two necessary column
df = sns.load_dataset('titanic').loc[:, ['sex', 'survived']]
# create a pivot table
dfp = df.pivot_table(index='sex', columns=['survived'], aggfunc=len)
# display(dfp)
survived 0 1
sex
female 81 233
male 468 109
# plot the dataframe
dfp.plot(kind='bar', stacked=True, ylabel='Counts', xlabel='Gender',
title='Survival Status Count by Gender', rot=0)
- 我不推荐堆叠条形图,因为它更难区分和比较每个类别的值。
dfp.plot(kind='bar', stacked=False, ylabel='Counts', xlabel='Gender',
title='Survival Status Count by Gender', rot=0)
对于这种精心制作的图,尤其是 DataFrame,我更喜欢使用 plotly,因为输出更具交互性。我没有使用组操作,因为逻辑索引可以解决问题。
最后,由于堆叠直方图的性质,你只能代表死者,因为幸存者将是左边的区域。如果你无论如何都想代表他们(用不同的颜色),请随时发表评论。
希望它能解决您的问题!
import plotly.graph_objects as go
import numpy as np
import seaborn as sns
df = sns.load_dataset('titanic').loc[:, ['sex', 'survived']]
male_df = df[df['sex'] == 'male']
female_df = df[df['sex'] == 'female']
fig = go.Figure(go.Histogram(
x=df['sex'], bingroup=1, name='total number of male/female'
))
fig.add_trace(go.Histogram(
x=male_df[male_df['survived'] == 0]['sex'], bingroup=1, name='number of deceased male'
))
fig.add_trace(go.Histogram(
x=female_df[female_df['survived'] == 0]['sex'], bingroup=1, name='number of deceased female'
))
fig.update_layout(
title='Passengers by survivality and gender',
barmode='overlay',
bargap=0.1
)
fig.show()