通过对 Pandas 中的值进行分组来获取计数和百分比
Get the count and percentage by grouping values in Pandas
我在 pandas、
中有以下数据框
Score Risk
30 High Risk
50 Medium Risk
70 Medium Risk
40 Medium Risk
80 Low Risk
35 High Risk
65 Medium Risk
90 Low Risk
我想获取总计数、按计数分组并按风险列的值百分比,如下所示:
Expected output
Risk Category Count Percentage
High Risk 2 25.00
Medium Risk 4 50.00
Low Risk 2 25.00
Total 8 100.00
谁能解释一下我怎样才能达到预期的输出。
您可以使用 GroupBy.size
with count percentages, join in concat
,添加 total
行,如有必要,最后将索引转换为列:
s = df.groupby('Risk')['Score'].size()
df = pd.concat([s, s / s.sum() * 100], axis=1, keys=('count','Percentage'))
df.loc['Total'] = df.sum().astype(int)
print (df)
count Percentage
Risk
High Risk 2 25.0
Low Risk 2 25.0
Medium Risk 4 50.0
Total 8 100.0
df = df.rename_axis('Risk Category').reset_index()
print (df)
Risk Category count Percentage
0 High Risk 2 25.0
1 Low Risk 2 25.0
2 Medium Risk 4 50.0
3 Total 8 100.0
您也可以使用 pivot_table
得到一个相当清晰的答案,因为它可以自动为您创建保证金总数。
summary = (
df.pivot_table(
index='Risk', aggfunc='count', margins='row', margins_name='Total'
)
.assign(Percentage=lambda df: df['Score'] / df.loc['Total', 'Score'] * 100)
.rename_axis('Risk Category')
.reset_index()
)
print(summary)
Risk Category Score Percentage
0 High Risk 2 25.0
1 Low Risk 2 25.0
2 Medium Risk 4 50.0
3 Total 8 100.0
我在 pandas、
中有以下数据框Score Risk
30 High Risk
50 Medium Risk
70 Medium Risk
40 Medium Risk
80 Low Risk
35 High Risk
65 Medium Risk
90 Low Risk
我想获取总计数、按计数分组并按风险列的值百分比,如下所示:
Expected output
Risk Category Count Percentage
High Risk 2 25.00
Medium Risk 4 50.00
Low Risk 2 25.00
Total 8 100.00
谁能解释一下我怎样才能达到预期的输出。
您可以使用 GroupBy.size
with count percentages, join in concat
,添加 total
行,如有必要,最后将索引转换为列:
s = df.groupby('Risk')['Score'].size()
df = pd.concat([s, s / s.sum() * 100], axis=1, keys=('count','Percentage'))
df.loc['Total'] = df.sum().astype(int)
print (df)
count Percentage
Risk
High Risk 2 25.0
Low Risk 2 25.0
Medium Risk 4 50.0
Total 8 100.0
df = df.rename_axis('Risk Category').reset_index()
print (df)
Risk Category count Percentage
0 High Risk 2 25.0
1 Low Risk 2 25.0
2 Medium Risk 4 50.0
3 Total 8 100.0
您也可以使用 pivot_table
得到一个相当清晰的答案,因为它可以自动为您创建保证金总数。
summary = (
df.pivot_table(
index='Risk', aggfunc='count', margins='row', margins_name='Total'
)
.assign(Percentage=lambda df: df['Score'] / df.loc['Total', 'Score'] * 100)
.rename_axis('Risk Category')
.reset_index()
)
print(summary)
Risk Category Score Percentage
0 High Risk 2 25.0
1 Low Risk 2 25.0
2 Medium Risk 4 50.0
3 Total 8 100.0