Pandas MultiIndex:如何删除特定列中具有零正值的整个级别?
Pandas MultiIndex: How to remove entire level that has zero positive values in specific column?
我有这个 Pandas MultiIndex:
有没有简单的方法,如果列 INFORMATION_SURPLUS_PCT
中没有正值,我可以删除任何级别。在示例图像中,这将完全删除 AAPL
级别。
谢谢
我更改了 DataFrame
以便更好地测试:
print df
INFORMATION_SURPLUS_DIFF INFORMATION_SURPLUS_PCT
SYMBOL
AAL 0 0.000000 0.000000
1 -0.010875 0.000000
2 -0.003659 0.000000
3 0.007364 0.000000
4 -0.018224 0.000000
5 0.015290 0.000000
6 0.067060 27.360990
7 0.028754 11.732043
8 0.021312 0.000000
9 0.083284 33.980826
10 0.073214 29.872141
AAPL 0 0.000000 0.000000
1 -0.032254 0.000000
2 -0.050695 0.000000
3 -0.009713 0.000000
4 -0.000673 0.000000
5 -0.021018 0.000000
AAPL1 6 -0.061908 0.000000
7 -0.029942 -1.000000
8 -0.074356 -1.000000
9 -0.154641 0.000000
10 -0.137246 0.000000
ADBE 0 0.000000 2.000000
1 0.000000 0.000000
2 0.000000 0.000000
idx=df[~(df['INFORMATION_SURPLUS_PCT']<=0).values].index.get_level_values('SYMBOL').unique()
print idx
['AAL' 'ADBE']
print df.loc[(idx, slice(None)),:]
INFORMATION_SURPLUS_DIFF INFORMATION_SURPLUS_PCT
SYMBOL
AAL 0 0.000000 0.000000
1 -0.010875 0.000000
2 -0.003659 0.000000
3 0.007364 0.000000
4 -0.018224 0.000000
5 0.015290 0.000000
6 0.067060 27.360990
7 0.028754 11.732043
8 0.021312 0.000000
9 0.083284 33.980826
10 0.073214 29.872141
ADBE 0 0.000000 2.000000
1 0.000000 0.000000
2 0.000000 0.000000
解释:
#use inverted by(~) condition (<= 0) for column INFORMATION_SURPLUS_PCT
print ~(df['INFORMATION_SURPLUS_PCT'] <= 0)
SYMBOL
AAL 0 False
1 False
2 False
3 False
4 False
5 False
6 True
7 True
8 False
9 True
10 True
AAPL 0 False
1 False
2 False
3 False
4 False
5 False
AAPL1 6 False
7 False
8 False
9 False
10 False
ADBE 0 True
1 False
2 False
Name: INFORMATION_SURPLUS_PCT, dtype: bool
#find all rows which contains at least one positive values in column INFORMATION_SURPLUS_PCT
print df[~(df['INFORMATION_SURPLUS_PCT'] <= 0).values]
INFORMATION_SURPLUS_DIFF INFORMATION_SURPLUS_PCT
SYMBOL
AAL 6 0.067060 27.360990
7 0.028754 11.732043
9 0.083284 33.980826
10 0.073214 29.872141
ADBE 0 0.000000 2.000000
#find all index value in level SYMBOL
print df[~(df['INFORMATION_SURPLUS_PCT'] <= 0).values].index.get_level_values('SYMBOL')
Index([u'AAL', u'AAL', u'AAL', u'AAL', u'ADBE'], dtype='object', name=u'SYMBOL')
#get unique values of index
idx = df[~(df['INFORMATION_SURPLUS_PCT'] <= 0).values].index.get_level_values('SYMBOL').unique()
print idx
['AAL' 'ADBE']
#select all unique values
print df.loc[(idx, slice(None)),:]
我有这个 Pandas MultiIndex:
有没有简单的方法,如果列 INFORMATION_SURPLUS_PCT
中没有正值,我可以删除任何级别。在示例图像中,这将完全删除 AAPL
级别。
谢谢
我更改了 DataFrame
以便更好地测试:
print df
INFORMATION_SURPLUS_DIFF INFORMATION_SURPLUS_PCT
SYMBOL
AAL 0 0.000000 0.000000
1 -0.010875 0.000000
2 -0.003659 0.000000
3 0.007364 0.000000
4 -0.018224 0.000000
5 0.015290 0.000000
6 0.067060 27.360990
7 0.028754 11.732043
8 0.021312 0.000000
9 0.083284 33.980826
10 0.073214 29.872141
AAPL 0 0.000000 0.000000
1 -0.032254 0.000000
2 -0.050695 0.000000
3 -0.009713 0.000000
4 -0.000673 0.000000
5 -0.021018 0.000000
AAPL1 6 -0.061908 0.000000
7 -0.029942 -1.000000
8 -0.074356 -1.000000
9 -0.154641 0.000000
10 -0.137246 0.000000
ADBE 0 0.000000 2.000000
1 0.000000 0.000000
2 0.000000 0.000000
idx=df[~(df['INFORMATION_SURPLUS_PCT']<=0).values].index.get_level_values('SYMBOL').unique()
print idx
['AAL' 'ADBE']
print df.loc[(idx, slice(None)),:]
INFORMATION_SURPLUS_DIFF INFORMATION_SURPLUS_PCT
SYMBOL
AAL 0 0.000000 0.000000
1 -0.010875 0.000000
2 -0.003659 0.000000
3 0.007364 0.000000
4 -0.018224 0.000000
5 0.015290 0.000000
6 0.067060 27.360990
7 0.028754 11.732043
8 0.021312 0.000000
9 0.083284 33.980826
10 0.073214 29.872141
ADBE 0 0.000000 2.000000
1 0.000000 0.000000
2 0.000000 0.000000
解释:
#use inverted by(~) condition (<= 0) for column INFORMATION_SURPLUS_PCT
print ~(df['INFORMATION_SURPLUS_PCT'] <= 0)
SYMBOL
AAL 0 False
1 False
2 False
3 False
4 False
5 False
6 True
7 True
8 False
9 True
10 True
AAPL 0 False
1 False
2 False
3 False
4 False
5 False
AAPL1 6 False
7 False
8 False
9 False
10 False
ADBE 0 True
1 False
2 False
Name: INFORMATION_SURPLUS_PCT, dtype: bool
#find all rows which contains at least one positive values in column INFORMATION_SURPLUS_PCT
print df[~(df['INFORMATION_SURPLUS_PCT'] <= 0).values]
INFORMATION_SURPLUS_DIFF INFORMATION_SURPLUS_PCT
SYMBOL
AAL 6 0.067060 27.360990
7 0.028754 11.732043
9 0.083284 33.980826
10 0.073214 29.872141
ADBE 0 0.000000 2.000000
#find all index value in level SYMBOL
print df[~(df['INFORMATION_SURPLUS_PCT'] <= 0).values].index.get_level_values('SYMBOL')
Index([u'AAL', u'AAL', u'AAL', u'AAL', u'ADBE'], dtype='object', name=u'SYMBOL')
#get unique values of index
idx = df[~(df['INFORMATION_SURPLUS_PCT'] <= 0).values].index.get_level_values('SYMBOL').unique()
print idx
['AAL' 'ADBE']
#select all unique values
print df.loc[(idx, slice(None)),:]