如何按行计算百分比并注释 100% 堆积条
How to calculate percent by row and annotate 100 percent stacked bars
我需要帮助在 pandas 从数据框中的交叉表创建的堆叠条形图的每个部分中添加总数的百分比分布(无小数)。
这是示例数据:
data = {
'Name':['Alisa','Bobby','Bobby','Alisa','Bobby','Alisa',
'Alisa','Bobby','Bobby','Alisa','Bobby','Alisa'],
'Exam':['Semester 1','Semester 1','Semester 1','Semester 1','Semester 1','Semester 1',
'Semester 2','Semester 2','Semester 2','Semester 2','Semester 2','Semester 2'],
'Subject':['Mathematics','Mathematics','English','English','Science','Science',
'Mathematics','Mathematics','English','English','Science','Science'],
'Result':['Pass','Pass','Fail','Pass','Fail','Pass','Pass','Fail','Fail','Pass','Pass','Fail']}
df = pd.DataFrame(data)
# display(df)
Name Exam Subject Result
0 Alisa Semester 1 Mathematics Pass
1 Bobby Semester 1 Mathematics Pass
2 Bobby Semester 1 English Fail
3 Alisa Semester 1 English Pass
4 Bobby Semester 1 Science Fail
5 Alisa Semester 1 Science Pass
6 Alisa Semester 2 Mathematics Pass
7 Bobby Semester 2 Mathematics Fail
8 Bobby Semester 2 English Fail
9 Alisa Semester 2 English Pass
10 Bobby Semester 2 Science Pass
11 Alisa Semester 2 Science Fail
这是我的代码:
#crosstab
pal = ["royalblue", "dodgerblue", "lightskyblue", "lightblue"]
ax= pd.crosstab(df['Name'], df['Subject']).apply(lambda r: r/r.sum()*100, axis=1)
ax.plot.bar(figsize=(10,10),stacked=True, rot=0, color=pal)
display(ax)
plt.legend(loc='best', bbox_to_anchor=(0.1, 1.0),title="Subject",)
plt.xlabel('Name')
plt.ylabel('Percent Distribution')
plt.show()
我知道我需要添加一个 plt.text
一些方法,但无法弄清楚。我希望将总计的百分比嵌入到堆叠条中。
让我们试试:
# crosstab
pal = ["royalblue", "dodgerblue", "lightskyblue", "lightblue"]
ax= pd.crosstab(df['Name'], df['Subject']).apply(lambda r: r/r.sum()*100, axis=1)
ax_1 = ax.plot.bar(figsize=(10,10), stacked=True, rot=0, color=pal)
display(ax)
plt.legend(loc='upper center', bbox_to_anchor=(0.1, 1.0), title="Subject")
plt.xlabel('Name')
plt.ylabel('Percent Distribution')
for rec in ax_1.patches:
height = rec.get_height()
ax_1.text(rec.get_x() + rec.get_width() / 2,
rec.get_y() + height / 2,
"{:.0f}%".format(height),
ha='center',
va='bottom')
plt.show()
输出:
Subject English Mathematics Science
Name
Alisa 33.333333 33.333333 33.333333
Bobby 33.333333 33.333333 33.333333
- 从
matplotlib 3.4.2
使用 matplotlib.pyplot.bar_label
- 请参阅此 answer 以获得使用该方法的详尽解释以及其他示例。
- 使用
label_type='center'
会注解每个段的值,label_type='edge'
会注解段的累计和
- 使用
pandas.DataFrame.plot
和 kind='bar'
和 stacked=True
绘制堆积条是最简单的
- 以矢量化方式获取百分比(没有
.apply
):
- 使用
pd.crosstab
获取频率计数
- 沿
axis=0
除以 ct
除以 ct.sum(axis=1)
- 乘以 100,四舍五入。
- 最好使用
.crosstab
完成此操作,因为它会生成具有正确形状的数据框,用于绘制堆叠条形图。 .groupby
需要进一步重塑数据框。
- 测试于
python 3.10
、pandas 1.3.4
、matplotlib 3.5.0
import pandas as pd
import matplotlib.pyplot as plt
# get a frequency count using crosstab
ct = pd.crosstab(df['Name'], df['Subject'])
# vectorized calculation of the percent per row
ct = ct.div(ct.sum(axis=1), axis=0).mul(100).round(2)
# display(ct)
Subject English Mathematics Science
Name
Alisa 33.33 33.33 33.33
Bobby 33.33 33.33 33.33
# specify custom colors
pal = ["royalblue", "dodgerblue", "lightskyblue", "lightblue"]
# plot
ax = ct.plot(kind='bar', figsize=(10, 10), stacked=True, rot=0, color=pal, xlabel='Name', ylabel='Percent Distribution')
# move the legend
ax.legend(title='Subject', bbox_to_anchor=(1, 1.02), loc='upper left')
# iterate through each bar container
for c in ax.containers:
# add the annotations
ax.bar_label(c, fmt='%0.0f%%', label_type='center')
plt.show()
- 使用
label_type='edge'
以累计和注释
我需要帮助在 pandas 从数据框中的交叉表创建的堆叠条形图的每个部分中添加总数的百分比分布(无小数)。
这是示例数据:
data = {
'Name':['Alisa','Bobby','Bobby','Alisa','Bobby','Alisa',
'Alisa','Bobby','Bobby','Alisa','Bobby','Alisa'],
'Exam':['Semester 1','Semester 1','Semester 1','Semester 1','Semester 1','Semester 1',
'Semester 2','Semester 2','Semester 2','Semester 2','Semester 2','Semester 2'],
'Subject':['Mathematics','Mathematics','English','English','Science','Science',
'Mathematics','Mathematics','English','English','Science','Science'],
'Result':['Pass','Pass','Fail','Pass','Fail','Pass','Pass','Fail','Fail','Pass','Pass','Fail']}
df = pd.DataFrame(data)
# display(df)
Name Exam Subject Result
0 Alisa Semester 1 Mathematics Pass
1 Bobby Semester 1 Mathematics Pass
2 Bobby Semester 1 English Fail
3 Alisa Semester 1 English Pass
4 Bobby Semester 1 Science Fail
5 Alisa Semester 1 Science Pass
6 Alisa Semester 2 Mathematics Pass
7 Bobby Semester 2 Mathematics Fail
8 Bobby Semester 2 English Fail
9 Alisa Semester 2 English Pass
10 Bobby Semester 2 Science Pass
11 Alisa Semester 2 Science Fail
这是我的代码:
#crosstab
pal = ["royalblue", "dodgerblue", "lightskyblue", "lightblue"]
ax= pd.crosstab(df['Name'], df['Subject']).apply(lambda r: r/r.sum()*100, axis=1)
ax.plot.bar(figsize=(10,10),stacked=True, rot=0, color=pal)
display(ax)
plt.legend(loc='best', bbox_to_anchor=(0.1, 1.0),title="Subject",)
plt.xlabel('Name')
plt.ylabel('Percent Distribution')
plt.show()
我知道我需要添加一个 plt.text
一些方法,但无法弄清楚。我希望将总计的百分比嵌入到堆叠条中。
让我们试试:
# crosstab
pal = ["royalblue", "dodgerblue", "lightskyblue", "lightblue"]
ax= pd.crosstab(df['Name'], df['Subject']).apply(lambda r: r/r.sum()*100, axis=1)
ax_1 = ax.plot.bar(figsize=(10,10), stacked=True, rot=0, color=pal)
display(ax)
plt.legend(loc='upper center', bbox_to_anchor=(0.1, 1.0), title="Subject")
plt.xlabel('Name')
plt.ylabel('Percent Distribution')
for rec in ax_1.patches:
height = rec.get_height()
ax_1.text(rec.get_x() + rec.get_width() / 2,
rec.get_y() + height / 2,
"{:.0f}%".format(height),
ha='center',
va='bottom')
plt.show()
输出:
Subject English Mathematics Science
Name
Alisa 33.333333 33.333333 33.333333
Bobby 33.333333 33.333333 33.333333
- 从
matplotlib 3.4.2
使用matplotlib.pyplot.bar_label
- 请参阅此 answer 以获得使用该方法的详尽解释以及其他示例。
- 使用
label_type='center'
会注解每个段的值,label_type='edge'
会注解段的累计和
- 使用
pandas.DataFrame.plot
和kind='bar'
和stacked=True
绘制堆积条是最简单的
- 以矢量化方式获取百分比(没有
.apply
):- 使用
pd.crosstab
获取频率计数
- 沿
axis=0
除以ct
除以ct.sum(axis=1)
- 乘以 100,四舍五入。
- 最好使用
.crosstab
完成此操作,因为它会生成具有正确形状的数据框,用于绘制堆叠条形图。.groupby
需要进一步重塑数据框。
- 使用
- 测试于
python 3.10
、pandas 1.3.4
、matplotlib 3.5.0
import pandas as pd
import matplotlib.pyplot as plt
# get a frequency count using crosstab
ct = pd.crosstab(df['Name'], df['Subject'])
# vectorized calculation of the percent per row
ct = ct.div(ct.sum(axis=1), axis=0).mul(100).round(2)
# display(ct)
Subject English Mathematics Science
Name
Alisa 33.33 33.33 33.33
Bobby 33.33 33.33 33.33
# specify custom colors
pal = ["royalblue", "dodgerblue", "lightskyblue", "lightblue"]
# plot
ax = ct.plot(kind='bar', figsize=(10, 10), stacked=True, rot=0, color=pal, xlabel='Name', ylabel='Percent Distribution')
# move the legend
ax.legend(title='Subject', bbox_to_anchor=(1, 1.02), loc='upper left')
# iterate through each bar container
for c in ax.containers:
# add the annotations
ax.bar_label(c, fmt='%0.0f%%', label_type='center')
plt.show()
- 使用
label_type='edge'
以累计和注释