多索引数据帧拆分和堆栈
Multi-index dataframe split and stack
当我从 yfinance 下载数据时,每个代码有 8 列(开盘价、最高价、最低价等)。因为我下载了 15 个代码,所以我有 120 列和 1 个索引列(日期)。它们水平相加。见图 1
我不想在 2 个级别中有那么多列,而只需要 8 个独特的列。再加上创建一个新列来标识代码。参见图 2。
Image 1: Current Form
图片 1 但为原始文本:
Adj Close ... Volume
DANHOS13.MX FCFE18.MX FHIPO14.MX FIBRAHD15.MX FIBRAMQ12.MX FIBRAPL14.MX FIHO12.MX FINN13.MX FMTY14.MX FNOVA17.MX ... FIBRAPL14.MX FIHO12.MX FINN13.MX FMTY14.MX FNOVA17.MX FPLUS16.MX FSHOP13.MX FUNO11.MX FVIA16.MX TERRA13.MX
Date
2015-01-02 26.065336 NaN 18.526043 NaN 16.337654 18.520781 14.683501 11.301384 9.247743 NaN ... 338697 189552 148064 57 NaN NaN 212451 2649823 NaN 1111343
2015-01-05 24.670488 NaN 18.436762 NaN 15.857328 17.859756 13.795850 11.071105 9.209846 NaN ... 449555 364819 244594 19330 NaN NaN 491587 3317923 NaN 1255128
Image 2: Desired outcome
我申请的代码是:
start = dt.datetime(2015,1,1)
end = dt.datetime.now()
df = yf.download("FUNO11.MX FIBRAMQ12.MX FIHO12.MX DANHOS13.MX FINN13.MX FSHOP13.MX TERRA13.MX FMTY14.MX FIBRAPL14.MX FHIPO14.MX FIBRAHD15.MX FPLUS16.MX FVIA16.MX FNOVA17.MX FCFE18.MX",
start = start,
end = end,
group_by = 'Ticker',
actions = True)
我会稍微改变一下下载数据:
import yfinance as yf
from datetime import datetime as dt
from dateutil.relativedelta import relativedelta
start = dt(2015,1,1)
end = dt.now()
symbols = ["FUNO11.MX", "FIBRAMQ12.MX", "FIHO12.MX", "DANHOS13.MX", "FINN13.MX", "FSHOP13.MX", "TERRA13.MX", "FMTY14.MX",
"FIBRAPL14.MX", "FHIPO14.MX", "FIBRAHD15.MX", "FPLUS16.MX", "FVIA16.MX", "FNOVA17.MX", "FCFE18.MX"]
data = yf.download(symbols, start=start, end=end, actions=True)
然后
选项 1:
def reshaper(symb, dframe):
df = dframe.unstack().reset_index()
df.columns = ['variable','symbol','Date','Value']
df = df.loc[df.symbol==symb,['Date','variable','Value']].pivot_table(index='Date', columns='variable', values='Value').reset_index()
df.columns.name = ''
df['Ticker'] = symb
return df
h = pd.DataFrame()
for s in symbols:
h = h.append(reshaper(s, data), ignore_index=True)
h
选项 2:对于单行,您可以这样做:
data.stack().reset_index().rename(columns={'level_1':'Ticker'})
一个稍微简单的版本依赖于首先堆叠两列索引级别(度量和代码)以获得长格式整洁数据,然后在度量级别上堆叠,将代码和日期保持为索引:
import yfinance as yf
symbols = ["FUNO11.MX", "FIBRAMQ12.MX", "FIHO12.MX", "DANHOS13.MX",
"FINN13.MX", "FSHOP13.MX", "TERRA13.MX", "FMTY14.MX",
"FIBRAPL14.MX", "FHIPO14.MX", "FIBRAHD15.MX", "FPLUS16.MX",
"FVIA16.MX", "FNOVA17.MX", "FCFE18.MX"]
data = yf.download(symbols, start='2015-01-01', end='2020-11-15', actions=True)
data_reshape=data.stack(level=[0,1]).unstack(1)
data_reshape.index=data_reshape.index.set_names(['ticker'],level=[1])
data_reshape.head()
data_reshape.head()
Adj Close Close Dividends High \
Date ticker
2015-01-02 DANHOS13.MX 26.065336 37.000000 0.0 37.400002
FHIPO14.MX 18.526043 24.900000 0.0 24.900000
FIBRAMQ12.MX 16.337654 24.490000 0.0 25.110001
FIBRAPL14.MX 18.520781 26.740801 0.0 27.118500
FIHO12.MX 14.683501 21.670000 0.0 22.190001
Low Open Stock Splits Volume
Date ticker
2015-01-02 DANHOS13.MX 36.330002 36.330002 0.0 82849.0
FHIPO14.MX 24.900000 24.900000 0.0 94007.0
FIBRAMQ12.MX 24.350000 24.990000 0.0 1172917.0
FIBRAPL14.MX 26.343100 26.750700 0.0 338697.0
FIHO12.MX 21.209999 22.120001 0.0 189552.0
当我从 yfinance 下载数据时,每个代码有 8 列(开盘价、最高价、最低价等)。因为我下载了 15 个代码,所以我有 120 列和 1 个索引列(日期)。它们水平相加。见图 1
我不想在 2 个级别中有那么多列,而只需要 8 个独特的列。再加上创建一个新列来标识代码。参见图 2。
Image 1: Current Form
图片 1 但为原始文本:
Adj Close ... Volume
DANHOS13.MX FCFE18.MX FHIPO14.MX FIBRAHD15.MX FIBRAMQ12.MX FIBRAPL14.MX FIHO12.MX FINN13.MX FMTY14.MX FNOVA17.MX ... FIBRAPL14.MX FIHO12.MX FINN13.MX FMTY14.MX FNOVA17.MX FPLUS16.MX FSHOP13.MX FUNO11.MX FVIA16.MX TERRA13.MX
Date
2015-01-02 26.065336 NaN 18.526043 NaN 16.337654 18.520781 14.683501 11.301384 9.247743 NaN ... 338697 189552 148064 57 NaN NaN 212451 2649823 NaN 1111343
2015-01-05 24.670488 NaN 18.436762 NaN 15.857328 17.859756 13.795850 11.071105 9.209846 NaN ... 449555 364819 244594 19330 NaN NaN 491587 3317923 NaN 1255128
Image 2: Desired outcome
我申请的代码是:
start = dt.datetime(2015,1,1)
end = dt.datetime.now()
df = yf.download("FUNO11.MX FIBRAMQ12.MX FIHO12.MX DANHOS13.MX FINN13.MX FSHOP13.MX TERRA13.MX FMTY14.MX FIBRAPL14.MX FHIPO14.MX FIBRAHD15.MX FPLUS16.MX FVIA16.MX FNOVA17.MX FCFE18.MX",
start = start,
end = end,
group_by = 'Ticker',
actions = True)
我会稍微改变一下下载数据:
import yfinance as yf
from datetime import datetime as dt
from dateutil.relativedelta import relativedelta
start = dt(2015,1,1)
end = dt.now()
symbols = ["FUNO11.MX", "FIBRAMQ12.MX", "FIHO12.MX", "DANHOS13.MX", "FINN13.MX", "FSHOP13.MX", "TERRA13.MX", "FMTY14.MX",
"FIBRAPL14.MX", "FHIPO14.MX", "FIBRAHD15.MX", "FPLUS16.MX", "FVIA16.MX", "FNOVA17.MX", "FCFE18.MX"]
data = yf.download(symbols, start=start, end=end, actions=True)
然后 选项 1:
def reshaper(symb, dframe):
df = dframe.unstack().reset_index()
df.columns = ['variable','symbol','Date','Value']
df = df.loc[df.symbol==symb,['Date','variable','Value']].pivot_table(index='Date', columns='variable', values='Value').reset_index()
df.columns.name = ''
df['Ticker'] = symb
return df
h = pd.DataFrame()
for s in symbols:
h = h.append(reshaper(s, data), ignore_index=True)
h
选项 2:对于单行,您可以这样做:
data.stack().reset_index().rename(columns={'level_1':'Ticker'})
一个稍微简单的版本依赖于首先堆叠两列索引级别(度量和代码)以获得长格式整洁数据,然后在度量级别上堆叠,将代码和日期保持为索引:
import yfinance as yf
symbols = ["FUNO11.MX", "FIBRAMQ12.MX", "FIHO12.MX", "DANHOS13.MX",
"FINN13.MX", "FSHOP13.MX", "TERRA13.MX", "FMTY14.MX",
"FIBRAPL14.MX", "FHIPO14.MX", "FIBRAHD15.MX", "FPLUS16.MX",
"FVIA16.MX", "FNOVA17.MX", "FCFE18.MX"]
data = yf.download(symbols, start='2015-01-01', end='2020-11-15', actions=True)
data_reshape=data.stack(level=[0,1]).unstack(1)
data_reshape.index=data_reshape.index.set_names(['ticker'],level=[1])
data_reshape.head()
data_reshape.head()
Adj Close Close Dividends High \
Date ticker
2015-01-02 DANHOS13.MX 26.065336 37.000000 0.0 37.400002
FHIPO14.MX 18.526043 24.900000 0.0 24.900000
FIBRAMQ12.MX 16.337654 24.490000 0.0 25.110001
FIBRAPL14.MX 18.520781 26.740801 0.0 27.118500
FIHO12.MX 14.683501 21.670000 0.0 22.190001
Low Open Stock Splits Volume
Date ticker
2015-01-02 DANHOS13.MX 36.330002 36.330002 0.0 82849.0
FHIPO14.MX 24.900000 24.900000 0.0 94007.0
FIBRAMQ12.MX 24.350000 24.990000 0.0 1172917.0
FIBRAPL14.MX 26.343100 26.750700 0.0 338697.0
FIHO12.MX 21.209999 22.120001 0.0 189552.0