如何将分类列转换为列多索引的 0 级

How do I turn a categorical column into the level 0 of a column multi-index

如果我有这样的数据框:

生成方式:

import pandas as pd
import numpy as np

df = pd.DataFrame({'dataset': ['dataset1']*2 + ['dataset2']*2 + ['dataset3']*2,
                   'frame': [1,2] * 3,
                   'result1': np.random.randn(6),
                   'result2': np.random.randn(6),
                   'result3': np.random.randn(6),
                   'method': ['A']*3 + ['B']*3
                  })
df = df.set_index(['dataset','frame'])
df

如何转换它,以便我有多索引列,其中列 'method' 中的值是多索引的级别 0。 缺失值应该像这样填写,例如像这样:

最终目标是我希望能够轻松比较方法 'A' 和 'B' 之间 'result1'、'result2'、'result3' 列中的相应值。

您可以通过 DataFrame.set_index, reshape by DataFrame.unstack and last DataFrame.swaplevel with DataFrame.sort_indexmethod 添加到 MultiIndex:

df = df.set_index('method', append=True).unstack().swaplevel(1,0, axis=1).sort_index(axis=1)
print (df)
method                 A                             B                    
                 result1   result2   result3   result1   result2   result3
dataset  frame                                                            
dataset1 1      1.488609  1.130858  0.409016       NaN       NaN       NaN
         2      0.676011  0.645002  0.102751       NaN       NaN       NaN
dataset2 1     -0.418451  0.106414 -1.907722       NaN       NaN       NaN
         2           NaN       NaN       NaN -0.806521  0.422155  1.100224
dataset3 1           NaN       NaN       NaN  0.555876  0.124207 -1.402325
         2           NaN       NaN       NaN -0.705504 -0.837953 -0.225081

#if need remove second level
df = df.reset_index(level=1, drop=True)