更改标记(单元格)列下方的列顺序
Changing an order of columns below marged (cell) columns
我需要在几年内更改列的顺序。它需要是 ['profit'、'loss'、'other'、'status'、'index']。但我做不到,因为我上面有一行标志着一年,我需要保留它。通常的解决方案不起作用!这是我的代码,但它不起作用。
import pandas as pd
import numpy as np
df = pd.read_excel('products2.xlsx', index_col=[0])
df.columns = df.columns.str.split('_', expand=True)
new_data = df.stack(0)
new_data1 = new_data.eval('status = profit - loss + other')
new_data2 = new_data1.eval('index = (profit / status) / (loss / status)')
# if i put here an order, it will work temperoraly, until output and in the end I won't get an order that i want
order = new_data2.reindex(columns=['profit', 'loss', 'other', 'status', 'index'])
output = new_data2.unstack(1).swaplevel(0,1, axis=1).sort_index(axis=1)
# i should probably put it here, but it destroy a whole table
output.to_excel('output_products.xlsx')
这是 excel 篇文档中的 Dropbox link 篇。
如果我对您的问题的理解正确,您需要更改列的顺序,但只需根据列的级别 1。如果是这种情况,您可以像这样尝试重建索引:
output = output.reindex(columns=['profit', 'loss', 'other', 'status', 'index'], level=1)
因为你有 MultiIndex 并且你想要的 1 级列索引的顺序不是按字母顺序排列的(既不是升序也不是降序),你必须将 1 级列索引设置为 分类索引 MultiIndex.set_levels
and pd.CategoricalIndex
。然后,您可以 sort_index
在列上获得所需的列顺序:
df.columns = df.columns.set_levels(pd.CategoricalIndex(df.columns.levels[1],
categories=['profit', 'loss', 'other', 'status', 'index'], ordered=True),
level=1)
df = df.sort_index(axis=1)
演示
数据输入
因为您在 Dropbox 中的示例 Excel 只有 3 列 profit
、loss
、other
,并且缺少 2 列 status
和 index
,我已经尝试添加缺少的2列,如下:
print(df)
2017_index 2017_status 2017_profit 2017_loss 2017_other 2018_index 2018_status 2018_profit 2018_loss 2018_other 2019_index 2019_status 2019_profit 2019_loss 2019_other 2020_index 2020_status 2020_profit 2020_loss 2020_other
companies
company1 1 Ready 3000 500 3000 1 Ready 3500 2000 5000 1 Ready 3000 3000 4000 1 Ready 4400 3000 4000
company2 2 Ready 2900 600 2800 2 Ready 3300 1400 3400 2 Ready 3000 2000 300 2 Ready 3200 1700 500
company3 3 Ready 3100 900 3200 3 Ready 2900 2000 2400 3 Ready 3500 400 2000 3 Ready 5000 3100 1500
company4 4 Ready 2000 2000 3100 4 Ready 4100 1800 400 4 Ready 3000 3000 1900 4 Ready 4400 2200 1300
company5 5 Ready 1400 100 500 5 Ready 2000 3000 1300 5 Ready 2500 800 2000 5 Ready 3000 1200 1700
company6 6 Ready 2000 800 800 6 Ready 1500 2000 4000 6 Ready 2000 600 2000 6 Ready 3000 3400 3600
company7 7 Ready 2700 1500 1900 7 Ready 3000 2000 4400 7 Ready 2000 5000 3000 7 Ready 2000 3500 1200
company8 8 Ready 1300 1100 3000 8 Ready 1600 1800 3200 8 Ready 3000 2000 200 8 Ready 2700 2000 1400
然后,在您的代码之后,通过代码将列索引拆分为 MultiIndex:
df.columns = df.columns.str.split('_', expand=True)
我们得到:
print(df)
2017 2018 2019 2020
index status profit loss other index status profit loss other index status profit loss other index status profit loss other
companies
company1 1 Ready 3000 500 3000 1 Ready 3500 2000 5000 1 Ready 3000 3000 4000 1 Ready 4400 3000 4000
company2 2 Ready 2900 600 2800 2 Ready 3300 1400 3400 2 Ready 3000 2000 300 2 Ready 3200 1700 500
company3 3 Ready 3100 900 3200 3 Ready 2900 2000 2400 3 Ready 3500 400 2000 3 Ready 5000 3100 1500
company4 4 Ready 2000 2000 3100 4 Ready 4100 1800 400 4 Ready 3000 3000 1900 4 Ready 4400 2200 1300
company5 5 Ready 1400 100 500 5 Ready 2000 3000 1300 5 Ready 2500 800 2000 5 Ready 3000 1200 1700
company6 6 Ready 2000 800 800 6 Ready 1500 2000 4000 6 Ready 2000 600 2000 6 Ready 3000 3400 3600
company7 7 Ready 2700 1500 1900 7 Ready 3000 2000 4400 7 Ready 2000 5000 3000 7 Ready 2000 3500 1200
company8 8 Ready 1300 1100 3000 8 Ready 1600 1800 3200 8 Ready 3000 2000 200 8 Ready 2700 2000 1400
运行解法代码:
df.columns = df.columns.set_levels(pd.CategoricalIndex(df.columns.levels[1],
categories=['profit', 'loss', 'other', 'status', 'index'], ordered=True),
level=1)
df = df.sort_index(axis=1)
结果:
print(df)
2017 2018 2019 2020
profit loss other status index profit loss other status index profit loss other status index profit loss other status index
companies
company1 3000 500 3000 Ready 1 3500 2000 5000 Ready 1 3000 3000 4000 Ready 1 4400 3000 4000 Ready 1
company2 2900 600 2800 Ready 2 3300 1400 3400 Ready 2 3000 2000 300 Ready 2 3200 1700 500 Ready 2
company3 3100 900 3200 Ready 3 2900 2000 2400 Ready 3 3500 400 2000 Ready 3 5000 3100 1500 Ready 3
company4 2000 2000 3100 Ready 4 4100 1800 400 Ready 4 3000 3000 1900 Ready 4 4400 2200 1300 Ready 4
company5 1400 100 500 Ready 5 2000 3000 1300 Ready 5 2500 800 2000 Ready 5 3000 1200 1700 Ready 5
company6 2000 800 800 Ready 6 1500 2000 4000 Ready 6 2000 600 2000 Ready 6 3000 3400 3600 Ready 6
company7 2700 1500 1900 Ready 7 3000 2000 4400 Ready 7 2000 5000 3000 Ready 7 2000 3500 1200 Ready 7
company8 1300 1100 3000 Ready 8 1600 1800 3200 Ready 8 3000 2000 200 Ready 8 2700 2000 1400 Ready 8
我需要在几年内更改列的顺序。它需要是 ['profit'、'loss'、'other'、'status'、'index']。但我做不到,因为我上面有一行标志着一年,我需要保留它。通常的解决方案不起作用!这是我的代码,但它不起作用。
import pandas as pd
import numpy as np
df = pd.read_excel('products2.xlsx', index_col=[0])
df.columns = df.columns.str.split('_', expand=True)
new_data = df.stack(0)
new_data1 = new_data.eval('status = profit - loss + other')
new_data2 = new_data1.eval('index = (profit / status) / (loss / status)')
# if i put here an order, it will work temperoraly, until output and in the end I won't get an order that i want
order = new_data2.reindex(columns=['profit', 'loss', 'other', 'status', 'index'])
output = new_data2.unstack(1).swaplevel(0,1, axis=1).sort_index(axis=1)
# i should probably put it here, but it destroy a whole table
output.to_excel('output_products.xlsx')
这是 excel 篇文档中的 Dropbox link 篇。
如果我对您的问题的理解正确,您需要更改列的顺序,但只需根据列的级别 1。如果是这种情况,您可以像这样尝试重建索引:
output = output.reindex(columns=['profit', 'loss', 'other', 'status', 'index'], level=1)
因为你有 MultiIndex 并且你想要的 1 级列索引的顺序不是按字母顺序排列的(既不是升序也不是降序),你必须将 1 级列索引设置为 分类索引 MultiIndex.set_levels
and pd.CategoricalIndex
。然后,您可以 sort_index
在列上获得所需的列顺序:
df.columns = df.columns.set_levels(pd.CategoricalIndex(df.columns.levels[1],
categories=['profit', 'loss', 'other', 'status', 'index'], ordered=True),
level=1)
df = df.sort_index(axis=1)
演示
数据输入
因为您在 Dropbox 中的示例 Excel 只有 3 列 profit
、loss
、other
,并且缺少 2 列 status
和 index
,我已经尝试添加缺少的2列,如下:
print(df)
2017_index 2017_status 2017_profit 2017_loss 2017_other 2018_index 2018_status 2018_profit 2018_loss 2018_other 2019_index 2019_status 2019_profit 2019_loss 2019_other 2020_index 2020_status 2020_profit 2020_loss 2020_other
companies
company1 1 Ready 3000 500 3000 1 Ready 3500 2000 5000 1 Ready 3000 3000 4000 1 Ready 4400 3000 4000
company2 2 Ready 2900 600 2800 2 Ready 3300 1400 3400 2 Ready 3000 2000 300 2 Ready 3200 1700 500
company3 3 Ready 3100 900 3200 3 Ready 2900 2000 2400 3 Ready 3500 400 2000 3 Ready 5000 3100 1500
company4 4 Ready 2000 2000 3100 4 Ready 4100 1800 400 4 Ready 3000 3000 1900 4 Ready 4400 2200 1300
company5 5 Ready 1400 100 500 5 Ready 2000 3000 1300 5 Ready 2500 800 2000 5 Ready 3000 1200 1700
company6 6 Ready 2000 800 800 6 Ready 1500 2000 4000 6 Ready 2000 600 2000 6 Ready 3000 3400 3600
company7 7 Ready 2700 1500 1900 7 Ready 3000 2000 4400 7 Ready 2000 5000 3000 7 Ready 2000 3500 1200
company8 8 Ready 1300 1100 3000 8 Ready 1600 1800 3200 8 Ready 3000 2000 200 8 Ready 2700 2000 1400
然后,在您的代码之后,通过代码将列索引拆分为 MultiIndex:
df.columns = df.columns.str.split('_', expand=True)
我们得到:
print(df)
2017 2018 2019 2020
index status profit loss other index status profit loss other index status profit loss other index status profit loss other
companies
company1 1 Ready 3000 500 3000 1 Ready 3500 2000 5000 1 Ready 3000 3000 4000 1 Ready 4400 3000 4000
company2 2 Ready 2900 600 2800 2 Ready 3300 1400 3400 2 Ready 3000 2000 300 2 Ready 3200 1700 500
company3 3 Ready 3100 900 3200 3 Ready 2900 2000 2400 3 Ready 3500 400 2000 3 Ready 5000 3100 1500
company4 4 Ready 2000 2000 3100 4 Ready 4100 1800 400 4 Ready 3000 3000 1900 4 Ready 4400 2200 1300
company5 5 Ready 1400 100 500 5 Ready 2000 3000 1300 5 Ready 2500 800 2000 5 Ready 3000 1200 1700
company6 6 Ready 2000 800 800 6 Ready 1500 2000 4000 6 Ready 2000 600 2000 6 Ready 3000 3400 3600
company7 7 Ready 2700 1500 1900 7 Ready 3000 2000 4400 7 Ready 2000 5000 3000 7 Ready 2000 3500 1200
company8 8 Ready 1300 1100 3000 8 Ready 1600 1800 3200 8 Ready 3000 2000 200 8 Ready 2700 2000 1400
运行解法代码:
df.columns = df.columns.set_levels(pd.CategoricalIndex(df.columns.levels[1],
categories=['profit', 'loss', 'other', 'status', 'index'], ordered=True),
level=1)
df = df.sort_index(axis=1)
结果:
print(df)
2017 2018 2019 2020
profit loss other status index profit loss other status index profit loss other status index profit loss other status index
companies
company1 3000 500 3000 Ready 1 3500 2000 5000 Ready 1 3000 3000 4000 Ready 1 4400 3000 4000 Ready 1
company2 2900 600 2800 Ready 2 3300 1400 3400 Ready 2 3000 2000 300 Ready 2 3200 1700 500 Ready 2
company3 3100 900 3200 Ready 3 2900 2000 2400 Ready 3 3500 400 2000 Ready 3 5000 3100 1500 Ready 3
company4 2000 2000 3100 Ready 4 4100 1800 400 Ready 4 3000 3000 1900 Ready 4 4400 2200 1300 Ready 4
company5 1400 100 500 Ready 5 2000 3000 1300 Ready 5 2500 800 2000 Ready 5 3000 1200 1700 Ready 5
company6 2000 800 800 Ready 6 1500 2000 4000 Ready 6 2000 600 2000 Ready 6 3000 3400 3600 Ready 6
company7 2700 1500 1900 Ready 7 3000 2000 4400 Ready 7 2000 5000 3000 Ready 7 2000 3500 1200 Ready 7
company8 1300 1100 3000 Ready 8 1600 1800 3200 Ready 8 3000 2000 200 Ready 8 2700 2000 1400 Ready 8