Python: 按特定列的最大值对 Pandas MultiIndex 进行排序

Python: Sort Pandas MultiIndex by max value of specified colum

我正在尝试按特定列的最大值对 Python Pandas MultiIndex 进行排序,在本例中为 INFORMATION_SURPLUS_PCT.

如何在保持行的分组和顺序的同时对级别进行排序?

我试过:df.sort(['INFORMATION_SURPLUS_PCT'], ascending=False),但这会丢失行的分组。非常感谢任何帮助!

当前多索引输入:

              INFORMATION_SURPLUS_DIFF  INFORMATION_SURPLUS_PCT  
   SYMBOL                                                         
   AAL    0                   0.000000                 0.000000   
          1                  -0.008466                 1.000000   
          2                  -0.011333                 0.000000   
   ADI    0                   0.000000                 0.000000   
          1                  -0.010781                 2.000000   
          2                  -0.010414                 0.000000  

期望的输出:

              INFORMATION_SURPLUS_DIFF  INFORMATION_SURPLUS_PCT  
   SYMBOL
   ADI    0                   0.000000                 0.000000 
          1                  -0.010781                 2.000000 
          2                  -0.010414                 0.000000  
   AAL    0                   0.000000                 0.000000  
          1                  -0.008466                 1.000000 
          2                  -0.011333                 0.000000 

您可以 groupby by first level, find max, sort_values and get index. Last you can reindex DataFrame 第一级 - level=0:

print df
          INFORMATION_SURPLUS_DIFF  INFORMATION_SURPLUS_PCT
SYMBOL                                                     
AAL    0                  0.000000                      0.0
       1                 -0.008466                      1.0
       2                 -0.011333                      0.0
ADI    0                  0.000000                      0.0
       1                 -0.010781                      2.0
       2                 -0.010414                      0.0

print df.groupby(level=0)['INFORMATION_SURPLUS_PCT'].max().sort_values(ascending=False)
SYMBOL
ADI    2.0
AAL    1.0
Name: INFORMATION_SURPLUS_PCT, dtype: float64

idx = df.groupby(level=0)['INFORMATION_SURPLUS_PCT'].max().sort_values(ascending=False).index
print idx
Index([u'ADI', u'AAL'], dtype='object', name=u'SYMBOL')

print df.reindex(index=idx, level=0)
          INFORMATION_SURPLUS_DIFF  INFORMATION_SURPLUS_PCT
SYMBOL                                                     
ADI    0                  0.000000                      0.0
       1                 -0.010781                      2.0
       2                 -0.010414                      0.0
AAL    0                  0.000000                      0.0
       1                 -0.008466                      1.0
       2                 -0.011333                      0.0