如何将分类列转换为列多索引的 0 级
How do I turn a categorical column into the level 0 of a column multi-index
如果我有这样的数据框:
生成方式:
import pandas as pd
import numpy as np
df = pd.DataFrame({'dataset': ['dataset1']*2 + ['dataset2']*2 + ['dataset3']*2,
'frame': [1,2] * 3,
'result1': np.random.randn(6),
'result2': np.random.randn(6),
'result3': np.random.randn(6),
'method': ['A']*3 + ['B']*3
})
df = df.set_index(['dataset','frame'])
df
如何转换它,以便我有多索引列,其中列 'method'
中的值是多索引的级别 0。
缺失值应该像这样填写,例如像这样:
最终目标是我希望能够轻松比较方法 'A' 和 'B' 之间 'result1'、'result2'、'result3' 列中的相应值。
您可以通过 DataFrame.set_index
, reshape by DataFrame.unstack
and last DataFrame.swaplevel
with DataFrame.sort_index
将 method
添加到 MultiIndex
:
df = df.set_index('method', append=True).unstack().swaplevel(1,0, axis=1).sort_index(axis=1)
print (df)
method A B
result1 result2 result3 result1 result2 result3
dataset frame
dataset1 1 1.488609 1.130858 0.409016 NaN NaN NaN
2 0.676011 0.645002 0.102751 NaN NaN NaN
dataset2 1 -0.418451 0.106414 -1.907722 NaN NaN NaN
2 NaN NaN NaN -0.806521 0.422155 1.100224
dataset3 1 NaN NaN NaN 0.555876 0.124207 -1.402325
2 NaN NaN NaN -0.705504 -0.837953 -0.225081
#if need remove second level
df = df.reset_index(level=1, drop=True)
如果我有这样的数据框:
生成方式:
import pandas as pd
import numpy as np
df = pd.DataFrame({'dataset': ['dataset1']*2 + ['dataset2']*2 + ['dataset3']*2,
'frame': [1,2] * 3,
'result1': np.random.randn(6),
'result2': np.random.randn(6),
'result3': np.random.randn(6),
'method': ['A']*3 + ['B']*3
})
df = df.set_index(['dataset','frame'])
df
如何转换它,以便我有多索引列,其中列 'method'
中的值是多索引的级别 0。
缺失值应该像这样填写,例如像这样:
您可以通过 DataFrame.set_index
, reshape by DataFrame.unstack
and last DataFrame.swaplevel
with DataFrame.sort_index
将 method
添加到 MultiIndex
:
df = df.set_index('method', append=True).unstack().swaplevel(1,0, axis=1).sort_index(axis=1)
print (df)
method A B
result1 result2 result3 result1 result2 result3
dataset frame
dataset1 1 1.488609 1.130858 0.409016 NaN NaN NaN
2 0.676011 0.645002 0.102751 NaN NaN NaN
dataset2 1 -0.418451 0.106414 -1.907722 NaN NaN NaN
2 NaN NaN NaN -0.806521 0.422155 1.100224
dataset3 1 NaN NaN NaN 0.555876 0.124207 -1.402325
2 NaN NaN NaN -0.705504 -0.837953 -0.225081
#if need remove second level
df = df.reset_index(level=1, drop=True)