如何重塑多索引数据框
How to reshape multiindex dataframe
我有一个多索引 DataFrame
,我想将其列用作行,然后将列重命名为度量的名称。
arrays = [np.array(['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux']),
np.array(['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two'])]
df = pd.DataFrame(np.random.randn(8, 4), index=arrays)
0 1 2 3
bar one 0.049409 0.533468 0.528360 -1.437937
two 2.081377 -0.945535 0.237531 -0.781147
baz one 0.005216 1.158222 -1.178232 -1.470667
two -0.043834 -0.320864 -1.568357 0.803620
foo one -0.758539 -1.009726 0.139992 0.281034
two -1.806000 0.206872 -0.728195 1.051045
qux one -1.106591 -0.621868 -1.139649 -0.185527
two 0.176220 -0.961532 3.587891 0.627658
我想让我的数据框看起来像这样:
measure_name
bar one 0 0.049409
two 1 -0.945535
one 2 0.528360
two 3 -0.781147
我不知道该怎么做。我试过 pd.melt()
但这摆脱了多索引,我需要有一种方法将列中的值绑定到索引。
提前致谢!
IIUC:
df.stack().to_frame('measure_name')
measure_name
bar one 0 0.562183
1 2.090766
2 -0.164342
3 0.499693
two 0 -0.174269
1 -0.997726
2 0.820774
3 0.243022
baz one 0 -0.158621
1 0.520945
2 -0.356393
3 0.465289
two 0 -1.187833
1 0.886986
2 1.415511
3 0.940117
foo one 0 -0.010860
1 0.126255
2 1.131045
3 -0.899853
two 0 -1.121544
1 -0.327184
2 0.074396
3 0.214501
qux one 0 -0.028317
1 -1.476114
2 1.415711
3 -0.355655
two 0 0.285167
1 1.535384
2 0.074326
3 -1.860993
如果你想melt
,但@piRSquared 解决方案更好:
(df.reset_index()
.melt(id_vars=['level_0','level_1'],value_name='measure_name')
.set_index(['level_0','level_1','variable'])
.rename_axis([None]*3)
.sort_index())
输出:
measure_name
bar one 0 -1.442157
1 -0.738047
2 -1.724773
3 0.952186
two 0 0.470124
1 0.296891
2 0.208106
3 1.050396
baz one 0 1.480720
1 1.054237
2 -0.195591
3 0.994051
two 0 -0.671022
1 -0.587526
2 -0.664228
3 1.474525
foo one 0 -0.427713
1 -0.083597
2 -0.460711
3 0.646449
two 0 -0.140055
1 1.029966
2 0.431720
3 -0.902373
qux one 0 -3.126427
1 0.904205
2 -0.592984
3 1.812544
two 0 -1.450957
1 1.453259
2 -0.929294
3 -0.147798
我有一个多索引 DataFrame
,我想将其列用作行,然后将列重命名为度量的名称。
arrays = [np.array(['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux']),
np.array(['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two'])]
df = pd.DataFrame(np.random.randn(8, 4), index=arrays)
0 1 2 3
bar one 0.049409 0.533468 0.528360 -1.437937
two 2.081377 -0.945535 0.237531 -0.781147
baz one 0.005216 1.158222 -1.178232 -1.470667
two -0.043834 -0.320864 -1.568357 0.803620
foo one -0.758539 -1.009726 0.139992 0.281034
two -1.806000 0.206872 -0.728195 1.051045
qux one -1.106591 -0.621868 -1.139649 -0.185527
two 0.176220 -0.961532 3.587891 0.627658
我想让我的数据框看起来像这样:
measure_name
bar one 0 0.049409
two 1 -0.945535
one 2 0.528360
two 3 -0.781147
我不知道该怎么做。我试过 pd.melt()
但这摆脱了多索引,我需要有一种方法将列中的值绑定到索引。
提前致谢!
IIUC:
df.stack().to_frame('measure_name')
measure_name
bar one 0 0.562183
1 2.090766
2 -0.164342
3 0.499693
two 0 -0.174269
1 -0.997726
2 0.820774
3 0.243022
baz one 0 -0.158621
1 0.520945
2 -0.356393
3 0.465289
two 0 -1.187833
1 0.886986
2 1.415511
3 0.940117
foo one 0 -0.010860
1 0.126255
2 1.131045
3 -0.899853
two 0 -1.121544
1 -0.327184
2 0.074396
3 0.214501
qux one 0 -0.028317
1 -1.476114
2 1.415711
3 -0.355655
two 0 0.285167
1 1.535384
2 0.074326
3 -1.860993
如果你想melt
,但@piRSquared 解决方案更好:
(df.reset_index()
.melt(id_vars=['level_0','level_1'],value_name='measure_name')
.set_index(['level_0','level_1','variable'])
.rename_axis([None]*3)
.sort_index())
输出:
measure_name
bar one 0 -1.442157
1 -0.738047
2 -1.724773
3 0.952186
two 0 0.470124
1 0.296891
2 0.208106
3 1.050396
baz one 0 1.480720
1 1.054237
2 -0.195591
3 0.994051
two 0 -0.671022
1 -0.587526
2 -0.664228
3 1.474525
foo one 0 -0.427713
1 -0.083597
2 -0.460711
3 0.646449
two 0 -0.140055
1 1.029966
2 0.431720
3 -0.902373
qux one 0 -3.126427
1 0.904205
2 -0.592984
3 1.812544
two 0 -1.450957
1 1.453259
2 -0.929294
3 -0.147798