pandas:MultiIndex 切片 - 混合切片和列表

pandas: MultiIndex Slicing - Mixing slices and lists

我正在尝试在 pandas 中使用(不是真正的)新切片运算符,但有些东西我不太明白。假设我生成以下分层数据框:

#Generate container to hold component DFs
df_list=[]

#Generate names for third dimension positions
third_names=['front','middle','back']

#For three positions in the third dimension...
for lab in third_names:
    #...generate the corresponding section of raw data...
    d=DataFrame(np.random.uniform(size=20).reshape(4,5),columns='a b c d e'.split(' '))
    #...name the columns dimension...
    d.columns.name='dim1'
    #...generate second and third dims (to go in index)...
    d['dim2']=['one','two','three','four']
    d['dim3']=lab
    #...set index...
    d.set_index(['dim3','dim2'],inplace=True)
    #...and throw the DF in the container
    df_list.append(d)

#Concatenate component DFs together
d3=pd.concat(df_list)

d3_long=d3.stack().sortlevel(0)

print d3_long

产量:

dim3    dim2   dim1
back    four   a       0.501184
               b       0.627202
               c       0.329643
               d       0.484261
               e       0.884803
        one    a       0.834231
               b       0.918897
               c       0.196537
               d       0.242109
               e       0.860124
        three  a       0.782651
               b       0.998361
               c       0.849685
               d       0.210377
               e       0.866776
        two    a       0.908422
               b       0.737073
               c       0.064402
               d       0.240718
               e       0.044409
front   four   a       0.100877
               b       0.963870
               c       0.254075
               d       0.126556
               e       0.033631
        one    a       0.243552
               b       0.999168
               c       0.752251
               d       0.684718
               e       0.353013
        three  a       0.938928
               b       0.112993
               c       0.615178
               d       0.430318
               e       0.330437
        two    a       0.301921
               b       0.645425
               c       0.464172
               d       0.824765
               e       0.606823
middle  four   a       0.814888
               b       0.228860
               c       0.333184
               d       0.622176
               e       0.151248
        one    a       0.547780
               b       0.592404
               c       0.684111
               d       0.885605
               e       0.601560
        three  a       0.340951
               b       0.839149
               c       0.800098
               d       0.663753
               e       0.215224
        two    a       0.138430
               b       0.917627
               c       0.342968
               d       0.406744
               e       0.822957
dtype: float64

我可以在前两个维度上获得我期望的行为...

print d3_long.loc[(slice('front','middle'),slice('two','four')),:]

产量:

dim3    dim2   dim1
front   four   a       0.100877
               b       0.963870
               c       0.254075
               d       0.126556
               e       0.033631
        one    a       0.243552
               b       0.999168
               c       0.752251
               d       0.684718
               e       0.353013
        three  a       0.938928
               b       0.112993
               c       0.615178
               d       0.430318
               e       0.330437
        two    a       0.301921
               b       0.645425
               c       0.464172
               d       0.824765
               e       0.606823
middle  four   a       0.814888
               b       0.228860
               c       0.333184
               d       0.622176
               e       0.151248
        one    a       0.547780
               b       0.592404
               c       0.684111
               d       0.885605
               e       0.601560
        three  a       0.340951
               b       0.839149
               c       0.800098
               d       0.663753
               e       0.215224
        two    a       0.138430
               b       0.917627
               c       0.342968
               d       0.406744
               e       0.822957
dtype: float64

但是,以下调用会产生完全相同的结果。

d3_long.loc[(slice('front','middle'),slice('two','four'),slice('b','d')),:]

好像忽略了MultiIndex的第三层。当我尝试使用列表构造来获取特定位置时...

d3_long.loc[(slice('front','middle'),slice('two','four'),['b','d']),:]

它产生 TypeError。有什么想法吗?

d3_long 实际上是一个 Series,因此您不需要切片器中的最后一个 :。请注意,您的第二级 slice('two','four') 没有 select 任何东西(相当于 [-1:1])。

但是如果你颠倒顺序,它应该会给出你所期望的。

In [82]: d3_long.loc[slice('front','middle'),slice('four','two'), ['b','d']]
Out[82]: 
dim3    dim2   dim1
front   four   b       0.301573
               d       0.478005
        one    b       0.306292
               d       0.281984
        three  b       0.108174
               d       0.776523
        two    b       0.028694
               d       0.527417
middle  four   b       0.285103
               d       0.647165
        one    b       0.807411
               d       0.309446
        three  b       0.277752
               d       0.939555
        two    b       0.470019
               d       0.447640
dtype: float64