Pandas 使用多级索引旋转 Table
Pandas Pivot Table with multilevel index
我有一个包含商品及其年销售额的 df。我想将其更改为 pivot table 但具有两个级别的索引。
我的东风:
date brand_id brand_name art_id art_name count_art
2015 1 cat 10 A 120
2016 1 cat 10 A 100
2017 1 cat 12 B 80
2015 2 dog 20 C 100
2016 2 dog 25 D 110
2015 3 bird 30 E 50
2017 3 bird 31 F 90
我想要的结果是这样的:
2015 2016 2017
brand_id brand_name art_id art_name count_art art_id art_name count_art art_id art_name count_art
1 cat 10 A 120 10 A 100 12 B 80
2 dog 20 C 100 25 D 110 null null null
3 bird 30 E 50 null null null 31 F 90
现在我尝试了以下命令:
transformed_data = df.pivot_table(values=['art_id', 'art_name', 'count_art'], index=['brand_id', 'brand_name'], columns='date', aggfunc='first')
但是它没有按预期工作。我知道如何将行更改为年度列,但是我不知道如何将多行中的多列更改为包含更多列的一行。
IIUC,使用 pivot_table
command to include the values in the desired order. Then, use swaplevel
to reorder your levels, and sort_index
和 sort_remaining=False
以确保只对日期进行排序:
new_cols = ['art_id', 'art_name', 'count_art']
transformed_data = (
df.pivot_table(values=new_cols,
index=['brand_id', 'brand_name'],
columns=['date'], aggfunc='first')
[new_cols]
.swaplevel(axis=1)
.sort_index(level=0, axis=1, sort_remaining=False)
)
输出:
date 2015 2016 2017
art_id art_name count_art art_id art_name count_art art_id art_name count_art
brand_id brand_name
1 cat 10.0 A 120.0 10.0 A 100.0 12.0 B 80.0
2 dog 20.0 C 100.0 25.0 D 110.0 NaN NaN NaN
3 bird 30.0 E 50.0 NaN NaN NaN 31.0 F 90.0
添加DataFrame.swaplevel
with DataFrame.sort_index
:
df = (df.pivot_table(values=['art_id', 'art_name', 'count_art'],
index=['brand_id', 'brand_name'],
columns='date',
aggfunc='first')
.swaplevel(1, 0, axis=1)
.sort_index(level=0, axis=1, sort_remaining=False))
print (df)
date 2015 2016 \
art_id art_name count_art art_id art_name count_art
brand_id brand_name
1 cat 10.0 A 120.0 10.0 A 100.0
2 dog 20.0 C 100.0 25.0 D 110.0
3 bird 30.0 E 50.0 NaN NaN NaN
date 2017
art_id art_name count_art
brand_id brand_name
1 cat 12.0 B 80.0
2 dog NaN NaN NaN
3 bird 31.0 F 90.0
我有一个包含商品及其年销售额的 df。我想将其更改为 pivot table 但具有两个级别的索引。
我的东风:
date brand_id brand_name art_id art_name count_art
2015 1 cat 10 A 120
2016 1 cat 10 A 100
2017 1 cat 12 B 80
2015 2 dog 20 C 100
2016 2 dog 25 D 110
2015 3 bird 30 E 50
2017 3 bird 31 F 90
我想要的结果是这样的:
2015 2016 2017
brand_id brand_name art_id art_name count_art art_id art_name count_art art_id art_name count_art
1 cat 10 A 120 10 A 100 12 B 80
2 dog 20 C 100 25 D 110 null null null
3 bird 30 E 50 null null null 31 F 90
现在我尝试了以下命令:
transformed_data = df.pivot_table(values=['art_id', 'art_name', 'count_art'], index=['brand_id', 'brand_name'], columns='date', aggfunc='first')
但是它没有按预期工作。我知道如何将行更改为年度列,但是我不知道如何将多行中的多列更改为包含更多列的一行。
IIUC,使用 pivot_table
command to include the values in the desired order. Then, use swaplevel
to reorder your levels, and sort_index
和 sort_remaining=False
以确保只对日期进行排序:
new_cols = ['art_id', 'art_name', 'count_art']
transformed_data = (
df.pivot_table(values=new_cols,
index=['brand_id', 'brand_name'],
columns=['date'], aggfunc='first')
[new_cols]
.swaplevel(axis=1)
.sort_index(level=0, axis=1, sort_remaining=False)
)
输出:
date 2015 2016 2017
art_id art_name count_art art_id art_name count_art art_id art_name count_art
brand_id brand_name
1 cat 10.0 A 120.0 10.0 A 100.0 12.0 B 80.0
2 dog 20.0 C 100.0 25.0 D 110.0 NaN NaN NaN
3 bird 30.0 E 50.0 NaN NaN NaN 31.0 F 90.0
添加DataFrame.swaplevel
with DataFrame.sort_index
:
df = (df.pivot_table(values=['art_id', 'art_name', 'count_art'],
index=['brand_id', 'brand_name'],
columns='date',
aggfunc='first')
.swaplevel(1, 0, axis=1)
.sort_index(level=0, axis=1, sort_remaining=False))
print (df)
date 2015 2016 \
art_id art_name count_art art_id art_name count_art
brand_id brand_name
1 cat 10.0 A 120.0 10.0 A 100.0
2 dog 20.0 C 100.0 25.0 D 110.0
3 bird 30.0 E 50.0 NaN NaN NaN
date 2017
art_id art_name count_art
brand_id brand_name
1 cat 12.0 B 80.0
2 dog NaN NaN NaN
3 bird 31.0 F 90.0