反透视 pandas 具有多索引列的数据框
Unpivotting pandas dataframe with multi index columns
我有一个具有多级列索引的 Dataframe。请参阅下面的代码来构建 DataFrame。我已经用随机数替换了所有数据,以防止泄露敏感信息。
import pandas as pd
import numpy as np
import random
pd.options.display.max_columns=None
multi_index = [('Part Information', 'Brand'),
('Part Information', 'Model'),
('Part Information', 'Part Grouping'),
('Part Information', 'Part Desc'),
('Part Information', 'VWG Part Number(s)'),
('Part Information', 'VWG Retail Price'),
('Part Information', 'Trade Discount'),
('Part Information', 'VWG Trade Price'),
('Part Information', 'Is VWG Part Common or Unique?'),
('Part Information', 'VWG Volume'),
('Part Information', 'Competitor'),
('Q1 2018', 'Competitor Part Number'),
('Q1 2018', 'Competitor Brand'),
('Q1 2018', 'Competitor Retail Price'),
('Q1 2018', 'Competitor Trade Price'),
('Q2 2018', 'Competitor Part Number'),
('Q2 2018', 'Competitor Brand'),
('Q2 2018', 'Competitor Retail Price'),
('Q2 2018', 'Competitor Trade Price'),
('Q3 2018', 'Competitor Part Number'),
('Q3 2018', 'Competitor Brand'),
('Q3 2018', 'Competitor Retail Price'),
('Q3 2018', 'Competitor Trade Price'),
('Q4 2018', 'Competitor Part Number'),
('Q4 2018', 'Competitor Brand'),
('Q4 2018', 'Competitor Retail Price'),
('Q4 2018', 'Competitor Trade Price'),
('Q2 2019', 'Competitor Part Number'),
('Q2 2019', 'Competitor Brand'),
('Q2 2019', 'Competitor Retail Price'),
('Q2 2019', 'Competitor Trade Price'),
('Q3 2019', 'Competitor Part Number'),
('Q3 2019', 'Competitor Brand'),
('Q3 2019', 'Competitor Retail Price'),
('Q3 2019', 'Competitor Trade Price'),
('Q4 2019', 'Competitor Part Number'),
('Q4 2019', 'Competitor Brand'),
('Q4 2019', 'Competitor Retail Price'),
('Q4 2019', 'Competitor Trade Price')]
pd.MultiIndex.from_tuples(multi_index)
df = pd.DataFrame(np.random.randn(10, 39))
df.columns = pd.MultiIndex.from_tuples(multi_index)
输出数据帧显示在下面的屏幕截图中(对多个屏幕截图表示歉意)。
如您所见,我想对重复的列进行逆透视。此外,我想添加一个 'QUARTER'
列而不是具有多索引列结构。所以我希望按如下方式操作数据框:
cols = ['Brand', 'Model', 'Part Grouping', 'Part Desc', 'VWG Part Number(s)',
'VWG Retail Price', 'Trade Discount', 'VWG Trade Price',
'Is VWG Part Common or Unique?', 'VWG Volume', 'Competitor', 'QUARTER',
'Competitor Part Number', 'Competitor Brand', 'Competitor Retail Price',
'Competitor Trade Price']
df_new = pd.DataFrame(np.random.randn(10, 16), columns=cols)
df_new.loc[0:4, 'QUARTER'] = 'Q1 2018'
df_new.loc[4:8, 'QUARTER'] = 'Q2 2018'
df_new.loc[9, 'QUARTER'] = '...'
如何取消透视列层次结构级别 0 中的列类别???我使用 pd.melt()
还是 pd.stack()/unstack()
非常感谢任何帮助,如果您需要更多信息,请告诉我。
IIUC,你可以将DataFrame一分为二,拆开右边的部分并加入:
(df['Part Information']
.join(df.drop(columns='Part Information', level=0)
.stack(0)
.rename_axis((None, 'Quarter'))
.reset_index(1))
)
输出:
Brand Model Part Grouping Part Desc VWG Part Number(s) VWG Retail Price Trade Discount VWG Trade Price Is VWG Part Common or Unique? VWG Volume Competitor Quarter Competitor Brand Competitor Part Number Competitor Retail Price Competitor Trade Price
0 1.163696 0.789552 -1.673217 -0.256159 0.299669 -1.918318 1.741297 -0.005605 1.085802 -0.775250 -0.800543 Q1 2018 0.668761 -0.266060 -1.018759 -0.755990
0 1.163696 0.789552 -1.673217 -0.256159 0.299669 -1.918318 1.741297 -0.005605 1.085802 -0.775250 -0.800543 Q2 2018 1.386664 -1.832704 1.325866 -0.123179
0 1.163696 0.789552 -1.673217 -0.256159 0.299669 -1.918318 1.741297 -0.005605 1.085802 -0.775250 -0.800543 Q2 2019 -0.612474 -0.250223 -1.299746 -0.870354
0 1.163696 0.789552 -1.673217 -0.256159 0.299669 -1.918318 1.741297 -0.005605 1.085802 -0.775250 -0.800543 Q3 2018 -1.553103 1.462980 1.578326 0.417629
...
我有一个具有多级列索引的 Dataframe。请参阅下面的代码来构建 DataFrame。我已经用随机数替换了所有数据,以防止泄露敏感信息。
import pandas as pd
import numpy as np
import random
pd.options.display.max_columns=None
multi_index = [('Part Information', 'Brand'),
('Part Information', 'Model'),
('Part Information', 'Part Grouping'),
('Part Information', 'Part Desc'),
('Part Information', 'VWG Part Number(s)'),
('Part Information', 'VWG Retail Price'),
('Part Information', 'Trade Discount'),
('Part Information', 'VWG Trade Price'),
('Part Information', 'Is VWG Part Common or Unique?'),
('Part Information', 'VWG Volume'),
('Part Information', 'Competitor'),
('Q1 2018', 'Competitor Part Number'),
('Q1 2018', 'Competitor Brand'),
('Q1 2018', 'Competitor Retail Price'),
('Q1 2018', 'Competitor Trade Price'),
('Q2 2018', 'Competitor Part Number'),
('Q2 2018', 'Competitor Brand'),
('Q2 2018', 'Competitor Retail Price'),
('Q2 2018', 'Competitor Trade Price'),
('Q3 2018', 'Competitor Part Number'),
('Q3 2018', 'Competitor Brand'),
('Q3 2018', 'Competitor Retail Price'),
('Q3 2018', 'Competitor Trade Price'),
('Q4 2018', 'Competitor Part Number'),
('Q4 2018', 'Competitor Brand'),
('Q4 2018', 'Competitor Retail Price'),
('Q4 2018', 'Competitor Trade Price'),
('Q2 2019', 'Competitor Part Number'),
('Q2 2019', 'Competitor Brand'),
('Q2 2019', 'Competitor Retail Price'),
('Q2 2019', 'Competitor Trade Price'),
('Q3 2019', 'Competitor Part Number'),
('Q3 2019', 'Competitor Brand'),
('Q3 2019', 'Competitor Retail Price'),
('Q3 2019', 'Competitor Trade Price'),
('Q4 2019', 'Competitor Part Number'),
('Q4 2019', 'Competitor Brand'),
('Q4 2019', 'Competitor Retail Price'),
('Q4 2019', 'Competitor Trade Price')]
pd.MultiIndex.from_tuples(multi_index)
df = pd.DataFrame(np.random.randn(10, 39))
df.columns = pd.MultiIndex.from_tuples(multi_index)
输出数据帧显示在下面的屏幕截图中(对多个屏幕截图表示歉意)。
如您所见,我想对重复的列进行逆透视。此外,我想添加一个 'QUARTER'
列而不是具有多索引列结构。所以我希望按如下方式操作数据框:
cols = ['Brand', 'Model', 'Part Grouping', 'Part Desc', 'VWG Part Number(s)',
'VWG Retail Price', 'Trade Discount', 'VWG Trade Price',
'Is VWG Part Common or Unique?', 'VWG Volume', 'Competitor', 'QUARTER',
'Competitor Part Number', 'Competitor Brand', 'Competitor Retail Price',
'Competitor Trade Price']
df_new = pd.DataFrame(np.random.randn(10, 16), columns=cols)
df_new.loc[0:4, 'QUARTER'] = 'Q1 2018'
df_new.loc[4:8, 'QUARTER'] = 'Q2 2018'
df_new.loc[9, 'QUARTER'] = '...'
如何取消透视列层次结构级别 0 中的列类别???我使用 pd.melt()
还是 pd.stack()/unstack()
非常感谢任何帮助,如果您需要更多信息,请告诉我。
IIUC,你可以将DataFrame一分为二,拆开右边的部分并加入:
(df['Part Information']
.join(df.drop(columns='Part Information', level=0)
.stack(0)
.rename_axis((None, 'Quarter'))
.reset_index(1))
)
输出:
Brand Model Part Grouping Part Desc VWG Part Number(s) VWG Retail Price Trade Discount VWG Trade Price Is VWG Part Common or Unique? VWG Volume Competitor Quarter Competitor Brand Competitor Part Number Competitor Retail Price Competitor Trade Price
0 1.163696 0.789552 -1.673217 -0.256159 0.299669 -1.918318 1.741297 -0.005605 1.085802 -0.775250 -0.800543 Q1 2018 0.668761 -0.266060 -1.018759 -0.755990
0 1.163696 0.789552 -1.673217 -0.256159 0.299669 -1.918318 1.741297 -0.005605 1.085802 -0.775250 -0.800543 Q2 2018 1.386664 -1.832704 1.325866 -0.123179
0 1.163696 0.789552 -1.673217 -0.256159 0.299669 -1.918318 1.741297 -0.005605 1.085802 -0.775250 -0.800543 Q2 2019 -0.612474 -0.250223 -1.299746 -0.870354
0 1.163696 0.789552 -1.673217 -0.256159 0.299669 -1.918318 1.741297 -0.005605 1.085802 -0.775250 -0.800543 Q3 2018 -1.553103 1.462980 1.578326 0.417629
...