Python: 按特定列的最大值对 Pandas MultiIndex 进行排序
Python: Sort Pandas MultiIndex by max value of specified colum
我正在尝试按特定列的最大值对 Python Pandas MultiIndex
进行排序,在本例中为 INFORMATION_SURPLUS_PCT
.
如何在保持行的分组和顺序的同时对级别进行排序?
我试过:df.sort(['INFORMATION_SURPLUS_PCT'], ascending=False)
,但这会丢失行的分组。非常感谢任何帮助!
当前多索引输入:
INFORMATION_SURPLUS_DIFF INFORMATION_SURPLUS_PCT
SYMBOL
AAL 0 0.000000 0.000000
1 -0.008466 1.000000
2 -0.011333 0.000000
ADI 0 0.000000 0.000000
1 -0.010781 2.000000
2 -0.010414 0.000000
期望的输出:
INFORMATION_SURPLUS_DIFF INFORMATION_SURPLUS_PCT
SYMBOL
ADI 0 0.000000 0.000000
1 -0.010781 2.000000
2 -0.010414 0.000000
AAL 0 0.000000 0.000000
1 -0.008466 1.000000
2 -0.011333 0.000000
您可以 groupby
by first level, find max
, sort_values
and get index
. Last you can reindex
DataFrame
第一级 - level=0
:
print df
INFORMATION_SURPLUS_DIFF INFORMATION_SURPLUS_PCT
SYMBOL
AAL 0 0.000000 0.0
1 -0.008466 1.0
2 -0.011333 0.0
ADI 0 0.000000 0.0
1 -0.010781 2.0
2 -0.010414 0.0
print df.groupby(level=0)['INFORMATION_SURPLUS_PCT'].max().sort_values(ascending=False)
SYMBOL
ADI 2.0
AAL 1.0
Name: INFORMATION_SURPLUS_PCT, dtype: float64
idx = df.groupby(level=0)['INFORMATION_SURPLUS_PCT'].max().sort_values(ascending=False).index
print idx
Index([u'ADI', u'AAL'], dtype='object', name=u'SYMBOL')
print df.reindex(index=idx, level=0)
INFORMATION_SURPLUS_DIFF INFORMATION_SURPLUS_PCT
SYMBOL
ADI 0 0.000000 0.0
1 -0.010781 2.0
2 -0.010414 0.0
AAL 0 0.000000 0.0
1 -0.008466 1.0
2 -0.011333 0.0
我正在尝试按特定列的最大值对 Python Pandas MultiIndex
进行排序,在本例中为 INFORMATION_SURPLUS_PCT
.
如何在保持行的分组和顺序的同时对级别进行排序?
我试过:df.sort(['INFORMATION_SURPLUS_PCT'], ascending=False)
,但这会丢失行的分组。非常感谢任何帮助!
当前多索引输入:
INFORMATION_SURPLUS_DIFF INFORMATION_SURPLUS_PCT
SYMBOL
AAL 0 0.000000 0.000000
1 -0.008466 1.000000
2 -0.011333 0.000000
ADI 0 0.000000 0.000000
1 -0.010781 2.000000
2 -0.010414 0.000000
期望的输出:
INFORMATION_SURPLUS_DIFF INFORMATION_SURPLUS_PCT
SYMBOL
ADI 0 0.000000 0.000000
1 -0.010781 2.000000
2 -0.010414 0.000000
AAL 0 0.000000 0.000000
1 -0.008466 1.000000
2 -0.011333 0.000000
您可以 groupby
by first level, find max
, sort_values
and get index
. Last you can reindex
DataFrame
第一级 - level=0
:
print df
INFORMATION_SURPLUS_DIFF INFORMATION_SURPLUS_PCT
SYMBOL
AAL 0 0.000000 0.0
1 -0.008466 1.0
2 -0.011333 0.0
ADI 0 0.000000 0.0
1 -0.010781 2.0
2 -0.010414 0.0
print df.groupby(level=0)['INFORMATION_SURPLUS_PCT'].max().sort_values(ascending=False)
SYMBOL
ADI 2.0
AAL 1.0
Name: INFORMATION_SURPLUS_PCT, dtype: float64
idx = df.groupby(level=0)['INFORMATION_SURPLUS_PCT'].max().sort_values(ascending=False).index
print idx
Index([u'ADI', u'AAL'], dtype='object', name=u'SYMBOL')
print df.reindex(index=idx, level=0)
INFORMATION_SURPLUS_DIFF INFORMATION_SURPLUS_PCT
SYMBOL
ADI 0 0.000000 0.0
1 -0.010781 2.0
2 -0.010414 0.0
AAL 0 0.000000 0.0
1 -0.008466 1.0
2 -0.011333 0.0