如何将排序条件包含到 pivot_table 函数中?
How to include a sort criteria into a pivot_table function?
下面是我在 df 数据帧上使用 pivot_table 函数的代码。
df = pd.DataFrame({'State' : ['B','B','A','A','C', 'C'],
'Age' : ['1 to 5', '6 to 10', '1 to 5', '6 to 10', '1 to 5', '6 to 10'],
'Vaccinated' : [80, 20, 30, 60, 10, 15],
'Population': [100, 100, 100, 100, 100, 100],
'Percentage' : [0.80, 0.20, 0.30, 0.60, 0.10,0.15]})
df1 = pd.pivot_table(df,values=["Vaccinated", "Population","Percentage"],index=["State", "Age"], aggfunc=np.sum)
前面代码的结果:
Percentage Population Vaccinated
State Age
A 1 to 5 0.30 100 30
6 to 10 0.60 100 60
B 1 to 5 0.80 100 80
6 to 10 0.20 100 20
C 1 to 5 0.10 100 10
6 to 10 0.15 100 15
但是,我想对我的记录进行排序,使状态 B 位于顶部,然后是 A,然后是 C。
合理的是因为 B 国 100% 接种了疫苗 (60%+40%),A 国有 90% (60%+30%) 而 C 国有 25%。尝试添加排序几次,但我遇到了错误。
我可以寻求建议,如何在 pivot_table 期间或之后添加排序标准,以便获得以下结果吗?
Percentage Population Vaccinated
State Age
B 1 to 5 0.80 100 80
6 to 10 0.20 100 20
A 1 to 5 0.30 100 30
6 to 10 0.60 100 60
C 1 to 5 0.10 100 10
6 to 10 0.15 100 15
一种方法是用group sum制作辅助列,根据它对df进行排序然后删除它:
df1 = df1.assign(Sum=df1.groupby(level=0).Vaccinated.transform('sum')).\
sort_values(by='Sum', ascending=False).drop(columns=['Sum'])
print(df1)
打印:
Percentage Population Vaccinated
State Age
B 1 to 5 0.80 100 80
6 to 10 0.20 100 20
A 1 to 5 0.30 100 30
6 to 10 0.60 100 60
C 1 to 5 0.10 100 10
6 to 10 0.15 100 15
我们可以在 State
级别使用 groupby sum
to get the total Vaccinated
per State
, then sort_values
to determine the order that the states should be in, then we can reindex
来根据组总数重新排序:
df1 = df1.reindex(
index=df1.groupby(level='State')['Vaccinated'].sum()
.sort_values(ascending=False).index,
level='State'
)
df
:
Percentage Population Vaccinated
State Age
B 1 to 5 0.80 100 80
6 to 10 0.20 100 20
A 1 to 5 0.30 100 30
6 to 10 0.60 100 60
C 1 to 5 0.10 100 10
6 to 10 0.15 100 15
下面是我在 df 数据帧上使用 pivot_table 函数的代码。
df = pd.DataFrame({'State' : ['B','B','A','A','C', 'C'],
'Age' : ['1 to 5', '6 to 10', '1 to 5', '6 to 10', '1 to 5', '6 to 10'],
'Vaccinated' : [80, 20, 30, 60, 10, 15],
'Population': [100, 100, 100, 100, 100, 100],
'Percentage' : [0.80, 0.20, 0.30, 0.60, 0.10,0.15]})
df1 = pd.pivot_table(df,values=["Vaccinated", "Population","Percentage"],index=["State", "Age"], aggfunc=np.sum)
前面代码的结果:
Percentage Population Vaccinated
State Age
A 1 to 5 0.30 100 30
6 to 10 0.60 100 60
B 1 to 5 0.80 100 80
6 to 10 0.20 100 20
C 1 to 5 0.10 100 10
6 to 10 0.15 100 15
但是,我想对我的记录进行排序,使状态 B 位于顶部,然后是 A,然后是 C。 合理的是因为 B 国 100% 接种了疫苗 (60%+40%),A 国有 90% (60%+30%) 而 C 国有 25%。尝试添加排序几次,但我遇到了错误。
我可以寻求建议,如何在 pivot_table 期间或之后添加排序标准,以便获得以下结果吗?
Percentage Population Vaccinated
State Age
B 1 to 5 0.80 100 80
6 to 10 0.20 100 20
A 1 to 5 0.30 100 30
6 to 10 0.60 100 60
C 1 to 5 0.10 100 10
6 to 10 0.15 100 15
一种方法是用group sum制作辅助列,根据它对df进行排序然后删除它:
df1 = df1.assign(Sum=df1.groupby(level=0).Vaccinated.transform('sum')).\
sort_values(by='Sum', ascending=False).drop(columns=['Sum'])
print(df1)
打印:
Percentage Population Vaccinated
State Age
B 1 to 5 0.80 100 80
6 to 10 0.20 100 20
A 1 to 5 0.30 100 30
6 to 10 0.60 100 60
C 1 to 5 0.10 100 10
6 to 10 0.15 100 15
我们可以在 State
级别使用 groupby sum
to get the total Vaccinated
per State
, then sort_values
to determine the order that the states should be in, then we can reindex
来根据组总数重新排序:
df1 = df1.reindex(
index=df1.groupby(level='State')['Vaccinated'].sum()
.sort_values(ascending=False).index,
level='State'
)
df
:
Percentage Population Vaccinated
State Age
B 1 to 5 0.80 100 80
6 to 10 0.20 100 20
A 1 to 5 0.30 100 30
6 to 10 0.60 100 60
C 1 to 5 0.10 100 10
6 to 10 0.15 100 15