在行而不是列中使用 TimeSeries 重塑 Pandas DataFrame

Question

我有一个 DataFrame df，其中包含 2010 年 1 月至 2021 年 12 月期间每一天的价格数据（开盘价、收盘价、最高价、最低价）：

Name	ISIN	Data	02.01.2010	05.01.2010	06.01.2010	...	31.12.2021
Apple	US9835635986	Price Open	12.45	13.45	12.48	...	54.12
Apple	US9835635986	Price Close	12.58	15.35	12.38	...	54.43
Apple	US9835635986	Price High	12.78	15.85	12.83	...	54.91
Apple	US9835635986	Price Low	12.18	13.35	12.21	...	53.98
Microsoft	US1223928384	Price Open	12.45	13.45	12.48	...	43.56
...	..	...	...	...	...	...	...

我正在尝试将 table 重塑为以下格式：

Date	Name	ISIN	Price Open	Price Close	Price High	Price Low
02.01.2010	Apple	US9835635986	12.45	12.58	12.78	12.18
05.01.2010	Apple	US9835635986	13.45	15.35	15.85	13.35
...	...	...	...	...	...	...	...
02.01.2010	Microsoft	US1223928384	12.45	13.67	13.74	12.35

简单地转置 DateFrame 不起作用。我还尝试了 pivot，它给出了操作数不能广播到不同形状的错误消息。

dates = ['NAME','ISIN']
dates.append(df.columns.tolist()[3:]) # appends all columns names starting with 02.01.2010
df.pivot(index = dates, columns = 'Data', Values = 'Data')

如何获得所需格式的 DataFrame？

Answer 1

在转换日期时间之前使用 DataFrame.melt，最后排序 MultiIndex:

df = (df.melt(['Name','ISIN','Data'], var_name='Date')
        .assign(Date = lambda x: pd.to_datetime(x['Date'], format='%d.%m.%Y'))
        .pivot(index = ['Date','Name','ISIN'], columns = 'Data', values = 'value')
        .sort_index(level=[1,2,0])
        .reset_index()
        )
print (df)
Data       Date       Name          ISIN  Price Close  Price High  Price Low  \
0    2010-01-02      Apple  US9835635986        12.58       12.78      12.18   
1    2010-01-05      Apple  US9835635986        15.35       15.85      13.35   
2    2010-01-06      Apple  US9835635986        12.38       12.83      12.21   
3    2021-12-31      Apple  US9835635986        54.43       54.91      53.98   
4    2010-01-02  Microsoft  US1223928384          NaN         NaN        NaN   
5    2010-01-05  Microsoft  US1223928384          NaN         NaN        NaN   
6    2010-01-06  Microsoft  US1223928384          NaN         NaN        NaN   
7    2021-12-31  Microsoft  US1223928384          NaN         NaN        NaN   

Data  Price Open  
0          12.45  
1          13.45  
2          12.48  
3          54.12  
4          12.45  
5          13.45  
6          12.48  
7          43.56

另一个想法是先将列名转换为日期时间，然后按 DataFrame.stack and Series.unstack:

重塑

L = df.columns.tolist()
df = (df.set_axis(L[:3] + pd.to_datetime(L[3:], format='%d.%m.%Y').tolist(), axis=1)
         .rename_axis('Date', axis=1)
         .set_index(L[:3])
         .stack()
         .unstack(2)
         .reorder_levels([2,0,1])
         .reset_index())
print (df)
Data       Date       Name          ISIN  Price Close  Price High  Price Low  \
0    2010-01-02      Apple  US9835635986        12.58       12.78      12.18   
1    2010-01-05      Apple  US9835635986        15.35       15.85      13.35   
2    2010-01-06      Apple  US9835635986        12.38       12.83      12.21   
3    2021-12-31      Apple  US9835635986        54.43       54.91      53.98   
4    2010-01-02  Microsoft  US1223928384          NaN         NaN        NaN   
5    2010-01-05  Microsoft  US1223928384          NaN         NaN        NaN   
6    2010-01-06  Microsoft  US1223928384          NaN         NaN        NaN   
7    2021-12-31  Microsoft  US1223928384          NaN         NaN        NaN   

Data  Price Open  
0          12.45  
1          13.45  
2          12.48  
3          54.12  
4          12.45  
5          13.45  
6          12.48  
7          43.56

在行而不是列中使用 TimeSeries 重塑 Pandas DataFrame

Reshape Pandas DataFrame with TimeSeries in rows instead of columns

python

pivot-table

dataframe

pandas

pandas-melt