如何在不使用 .drop 命令中的所有级别的情况下从 multiindex Dataframe 中删除数据?

How can I drop data from multiindexed Dataframe without using all levels in .drop command?

我有一个数据框,我只想删除索引为 'car','p1' 的数据,但是当我使用 .drop 函数时,我需要使用所有 4 级索引 'car','valueA','row','p1' 来删除我想要的数据。 如何使用类似以下命令的命令从多索引 Dataframe 中删除数据: dataFrame.drop(('car',None,None,'p1'), axis=0, inplace=True)

这是我的数据代码和数据框,我通过使用整个 multiindex 'car','valueA','row','p1':

代码:

import numpy as np
import pandas as pd

# multiindex array
arr = [np.array(['car', 'car', 'car','car', 'car', 'car', 'car', 'car', 'car', 'truck', 'truck', 'truck', 'truck', 'truck', 'truck','truck', 'truck', 'truck','bike','bike', 'bike','bike','bike', 'bike','bike','bike', 'bike']),
       np.array(['valueA', 'valueA','valueA', 'valueA','valueA', 'valueA','valueA', 'valueA','valueA','valueB','valueB','valueB','valueB','valueB','valueB','valueB','valueB','valueB', 'valueC','valueC','valueC','valueC','valueC','valueC','valueC','valueC','valueC']),
       np.array(['row','row','row','row','row','row','row','row','row','row','row','row','row','row','row','row','row','row','row','row','row','row','row','row','row','row','row']),
       np.array(['p1','p1','p1','p2','p2','p2','p3','p3','p3','p1','p1','p1','p2','p2','p2','p3','p3','p3','p1','p1','p1','p2','p2','p2','p3','p3','p3',]),
       np.array(['1','2','3','1','2','3','1','2','3','1','2','3','1','2','3','1','2','3','1','2','3','1','2','3','1','2','3',])]

# forming multiindex dataframe
dataFrame = pd.DataFrame(
   np.random.randn(27, 3), index=arr,columns=['Col 1', 'Col 2', 'Col 3'])

dataFrame.index.names = ['level 0', 'level 1','level 2','level 3','level 4']
print(dataFrame)

print("\nDropping specific row...\n");
dataFrame.drop(('car','valueA','row','p1'), axis=0, inplace=True)
print(dataFrame)

删除后的数据帧:

                                            Col 1     Col 2     Col 3
level 0 level 1 level 2 level 3 level 4                              
car     valueA  row     p2      1       -0.202113  0.475475  0.871960
                                2        0.776150  1.435102 -0.756707
                                3        0.117550  0.120139  0.718093
                        p3      1       -1.141276 -0.656897  1.296046
                                2        1.632846  1.689873 -0.992740
                                3        0.207730 -0.007627  0.331016
truck   valueB  row     p1      1       -0.510714 -0.471667  1.423341
                                2       -0.753657  0.352551  0.688307
                                3       -0.824962  0.729206  0.295181
                        p2      1       -1.668048  0.883333  0.077169
                                2        0.496375  0.002827  0.202063
                                3        1.446275 -0.349694 -1.215787
                        p3      1        0.609428  2.184825  1.619343
                                2        0.039672 -0.338794 -1.023429
                                3        1.583751 -0.931371  0.784551
bike    valueC  row     p1      1       -0.896791  0.049717  1.555789
                                2        0.117095  1.407567  1.398970
                                3        0.813442  0.440550 -0.808965
                        p2      1        0.984040 -0.347328 -1.139446
                                2       -0.363173 -0.710894  2.973986
                                3       -0.810208  0.004661 -0.006106
                        p3      1        1.247540 -1.260834  0.139684
                                2        0.609170  1.841452  0.965086
                                3       -0.648415 -0.138171  0.697330

先决条件:IndexSlice

您可以使用 pandas.IndexSlice 轻松对列进行切片:

idx = pd.IndexSlice
dataFrame.loc[idx['car',:,:,'p1']]

输出:

                                            Col 1     Col 2     Col 3
level 0 level 1 level 2 level 3 level 4                              
car     valueA  row     p1      1          0.7433    0.7007    1.0691
                                2         -1.1336   -1.0243   -0.6874
                                3          0.2181    0.1967    1.6890

现在,让我们放下:

要删除,只需使用上面的方法获取要删除的行的索引:

to_drop = dataFrame.loc[idx['car',:,:,'p1']].index
dataFrame.drop(to_drop) # add inplace=True if needed to drop in place

输出:

                                            Col 1     Col 2     Col 3
level 0 level 1 level 2 level 3 level 4                              
car     valueA  row     p2      1          0.3053   -1.3057   -0.1287
                                2          2.5257   -1.6639   -0.5921
                                3          0.8080   -0.2103   -1.1286
                        p3      1         -0.7016    0.1553    2.1906
                                2          0.5787    0.2155   -1.0574
                                3         -0.4153    0.1872    0.2001
truck   valueB  row     p1      1         -1.2780    1.3715   -0.0653
                                2          0.2365   -0.0084   -0.4676
                                3          0.7442    0.0395    1.2570
                        p2      1          0.2128    0.0567   -0.6916
                                2         -0.7449   -0.3231   -1.3954
                                3         -0.3366   -2.1328   -0.9524
                        p3      1         -0.1372   -2.3368    0.3554
                                2         -0.3781   -0.9169    0.2724
                                3         -0.0303    0.2812   -1.0810
bike    valueC  row     p1      1         -0.4342    0.9801    0.2852
                                2          0.9794    0.7521   -0.6850
                                3          0.6731   -1.2610    1.0722
                        p2      1          1.0940    0.4086    0.9345
                                2          0.1387    0.7512   -1.0006
                                3         -0.1079   -0.1318    0.9483
                        p3      1         -0.8483   -0.7513   -0.2429
                                2         -1.6328    1.8877   -0.5835
                                3          1.1729   -1.0088    1.0520