在 pandas 中使用 double-string headers 重塑数据

Question

我有以下数据框：

                    AAPL,Price AAPL,Volume  ...  GOOG,Ivol  GOOG,Shares
  Date                                                                
0 2019-12-25              21.5        1879            20.0         2010                             
1 2019-12-26              22.1        1887            19.9         2000                              
2 2019-12-27              23.0        1888            19.9         2045       
3 2019-12-30              22.3        1887            NaN          2050                                
4 2019-12-31              22.4        1900            20.1         1998

我需要对这些股票的每一个特征进行运行回归，并且我必须附上一组特定的假人。因此，我的目标是重塑数据库，使我有一个双索引，第一位是日期，第二位是股票名称，即

Date        Stock  Price  Volume ... Ivol Shares
2019-12-25  AAPL    21.5    1879 ... 22.1   3121
...         ...    ...    ...    ... ...  ... 
2019-12-25  GOOG    45.8    NaN  ... 20.0   2000 
...
2019-12-25   VER     NaN    NaN  ...  NaN    NaN   
2019-12-26  AAPL   ...  
...                                           
2019-12-31   VER    42.4    1900 ... 50.1   1998

我的问题是我不知道如何处理列名，因为它们是 'Stock1,Feature1' 类型的字符串，因此我不知道如何生成适当的 double-index。有人可以帮忙吗？请注意，每只股票的特征都是相同的，即特征列表不会因股票而异。当然，在重塑数据方面，我对不同类型的解决方案持开放态度

Answer 1

您可以使用 pd.wide_to_long 重塑它。为了获得正确的格式，我将首先修改列索引以使 'AAPL,Price' -> 'Price,AAPL'.

df.columns = [','.join(x[::-1]) for x in df.columns.str.split(',')]
stubs = df.columns.str.split(',').str[0].unique().tolist()
#['Price', 'Volume', 'Ivol', 'Shares']    

df = df.reset_index('Date')  # Need it out of index
df = pd.wide_to_long(df, i='Date', j='Stock', stubnames=stubs, sep=',', suffix='.*')

#df, based on the 4 columns provided:

                  Price  Volume  Ivol  Shares
Date       Stock                             
2019-12-25 AAPL    21.5  1879.0   NaN     NaN
           GOOG     NaN     NaN  20.0  2010.0
2019-12-26 AAPL    22.1  1887.0   NaN     NaN
           GOOG     NaN     NaN  19.9  2000.0
2019-12-27 AAPL    23.0  1888.0   NaN     NaN
           GOOG     NaN     NaN  19.9  2045.0
2019-12-30 AAPL    22.3  1887.0   NaN     NaN
           GOOG     NaN     NaN   NaN  2050.0
2019-12-31 AAPL    22.4  1900.0   NaN     NaN
           GOOG     NaN     NaN  20.1  1998.0

也可能更简单，您可以创建列 MultiIndex 然后 stack

df.columns = pd.MultiIndex.from_tuples([tuple(x) for x in df.columns.str.split(',')])

df = df.stack(level=0) # Index level won't have a name.

在 pandas 中使用 double-string headers 重塑数据

Reshape data with double-string headers in pandas

reshape

multi-index

pandas