如何将排序条件包含到 pivot_table 函数中?

How to include a sort criteria into a pivot_table function?

下面是我在 df 数据帧上使用 pivot_table 函数的代码。

df = pd.DataFrame({'State' : ['B','B','A','A','C', 'C'],
           'Age' : ['1 to 5', '6 to 10', '1 to 5', '6 to 10', '1 to 5', '6 to 10'],
           'Vaccinated' : [80, 20, 30, 60, 10, 15],
           'Population': [100, 100, 100, 100, 100, 100],
           'Percentage' : [0.80, 0.20, 0.30, 0.60, 0.10,0.15]})

df1 = pd.pivot_table(df,values=["Vaccinated", "Population","Percentage"],index=["State", "Age"], aggfunc=np.sum)

前面代码的结果:

                   Percentage  Population  Vaccinated
State Age                                        
A     1 to 5         0.30         100          30
      6 to 10        0.60         100          60
B     1 to 5         0.80         100          80
      6 to 10        0.20         100          20
C     1 to 5         0.10         100          10
      6 to 10        0.15         100          15

但是,我想对我的记录进行排序,使状态 B 位于顶部,然后是 A,然后是 C。 合理的是因为 B 国 100% 接种了疫苗 (60%+40%),A 国有 90% (60%+30%) 而 C 国有 25%。尝试添加排序几次,但我遇到了错误。

我可以寻求建议,如何在 pivot_table 期间或之后添加排序标准,以便获得以下结果吗?

               Percentage  Population  Vaccinated
State Age                                        
B     1 to 5         0.80         100          80
      6 to 10        0.20         100          20
A     1 to 5         0.30         100          30
      6 to 10        0.60         100          60
C     1 to 5         0.10         100          10
      6 to 10        0.15         100          15

一种方法是用group sum制作辅助列,根据它对df进行排序然后删除它:

df1 = df1.assign(Sum=df1.groupby(level=0).Vaccinated.transform('sum')).\
    sort_values(by='Sum', ascending=False).drop(columns=['Sum'])
print(df1)

打印:

               Percentage  Population  Vaccinated
State Age                                        
B     1 to 5         0.80         100          80
      6 to 10        0.20         100          20
A     1 to 5         0.30         100          30
      6 to 10        0.60         100          60
C     1 to 5         0.10         100          10
      6 to 10        0.15         100          15

我们可以在 State 级别使用 groupby sum to get the total Vaccinated per State, then sort_values to determine the order that the states should be in, then we can reindex 来根据组总数重新排序:

df1 = df1.reindex(
    index=df1.groupby(level='State')['Vaccinated'].sum()
        .sort_values(ascending=False).index,
    level='State'
)

df:

               Percentage  Population  Vaccinated
State Age                                        
B     1 to 5         0.80         100          80
      6 to 10        0.20         100          20
A     1 to 5         0.30         100          30
      6 to 10        0.60         100          60
C     1 to 5         0.10         100          10
      6 to 10        0.15         100          15