多索引数据帧拆分和堆栈

Multi-index dataframe split and stack

当我从 yfinance 下载数据时,每个代码有 8 列(开盘价、最高价、最低价等)。因为我下载了 15 个代码,所以我有 120 列和 1 个索引列(日期)。它们水平相加。见图 1

我不想在 2 个级别中有那么多列,而只需要 8 个独特的列。再加上创建一个新列来标识代码。参见图 2。

Image 1: Current Form

图片 1 但为原始文本:

    Adj Close   ... Volume
DANHOS13.MX FCFE18.MX   FHIPO14.MX  FIBRAHD15.MX    FIBRAMQ12.MX    FIBRAPL14.MX    FIHO12.MX   FINN13.MX   FMTY14.MX   FNOVA17.MX  ... FIBRAPL14.MX    FIHO12.MX   FINN13.MX   FMTY14.MX   FNOVA17.MX  FPLUS16.MX  FSHOP13.MX  FUNO11.MX   FVIA16.MX   TERRA13.MX
Date                                                                                    
2015-01-02  26.065336   NaN 18.526043   NaN 16.337654   18.520781   14.683501   11.301384   9.247743    NaN ... 338697  189552  148064  57  NaN NaN 212451  2649823 NaN 1111343
2015-01-05  24.670488   NaN 18.436762   NaN 15.857328   17.859756   13.795850   11.071105   9.209846    NaN ... 449555  364819  244594  19330   NaN NaN 491587  3317923 NaN 1255128

Image 2: Desired outcome

我申请的代码是:

start = dt.datetime(2015,1,1)
end = dt.datetime.now()

df = yf.download("FUNO11.MX FIBRAMQ12.MX FIHO12.MX DANHOS13.MX FINN13.MX FSHOP13.MX TERRA13.MX FMTY14.MX FIBRAPL14.MX FHIPO14.MX FIBRAHD15.MX FPLUS16.MX FVIA16.MX FNOVA17.MX FCFE18.MX", 
                start = start,
                end = end,
                group_by = 'Ticker',
                actions = True)

我会稍微改变一下下载数据:

import yfinance as yf
from datetime import datetime as dt
from dateutil.relativedelta import relativedelta

start = dt(2015,1,1)
end = dt.now()
symbols = ["FUNO11.MX", "FIBRAMQ12.MX", "FIHO12.MX", "DANHOS13.MX", "FINN13.MX", "FSHOP13.MX", "TERRA13.MX", "FMTY14.MX",
           "FIBRAPL14.MX", "FHIPO14.MX", "FIBRAHD15.MX", "FPLUS16.MX", "FVIA16.MX", "FNOVA17.MX", "FCFE18.MX"]

data = yf.download(symbols, start=start, end=end, actions=True)

然后 选项 1:

def reshaper(symb, dframe):
    df = dframe.unstack().reset_index()
    df.columns = ['variable','symbol','Date','Value']
    df = df.loc[df.symbol==symb,['Date','variable','Value']].pivot_table(index='Date', columns='variable', values='Value').reset_index()
    df.columns.name = ''
    df['Ticker'] = symb
    return df


h = pd.DataFrame()

for s in symbols:
    h = h.append(reshaper(s, data), ignore_index=True)
    
h

选项 2:对于单行,您可以这样做:

data.stack().reset_index().rename(columns={'level_1':'Ticker'})

一个稍微简单的版本依赖于首先堆叠两列索引级别(度量和代码)以获得长格式整洁数据,然后在度量级别上堆叠,将代码和日期保持为索引:

import yfinance as yf

symbols = ["FUNO11.MX", "FIBRAMQ12.MX", "FIHO12.MX", "DANHOS13.MX", 
           "FINN13.MX", "FSHOP13.MX", "TERRA13.MX", "FMTY14.MX",
           "FIBRAPL14.MX", "FHIPO14.MX", "FIBRAHD15.MX", "FPLUS16.MX", 
           "FVIA16.MX", "FNOVA17.MX", "FCFE18.MX"]

data = yf.download(symbols, start='2015-01-01', end='2020-11-15', actions=True)

data_reshape=data.stack(level=[0,1]).unstack(1)
data_reshape.index=data_reshape.index.set_names(['ticker'],level=[1])
data_reshape.head()

data_reshape.head()

                         Adj Close      Close  Dividends       High  \
Date       ticker                                                     
2015-01-02 DANHOS13.MX   26.065336  37.000000        0.0  37.400002   
           FHIPO14.MX    18.526043  24.900000        0.0  24.900000   
           FIBRAMQ12.MX  16.337654  24.490000        0.0  25.110001   
           FIBRAPL14.MX  18.520781  26.740801        0.0  27.118500   
           FIHO12.MX     14.683501  21.670000        0.0  22.190001   

                               Low       Open  Stock Splits     Volume  
Date       ticker                                                       
2015-01-02 DANHOS13.MX   36.330002  36.330002           0.0    82849.0  
           FHIPO14.MX    24.900000  24.900000           0.0    94007.0  
           FIBRAMQ12.MX  24.350000  24.990000           0.0  1172917.0  
           FIBRAPL14.MX  26.343100  26.750700           0.0   338697.0  
           FIHO12.MX     21.209999  22.120001           0.0   189552.0