Pandas 列多索引到行
Pandas column multi-index to rows
我正在使用 yfinance 下载多个交易品种的价格历史记录,其中 returns 具有多个索引的数据框。例如:
import yfinance as yf
df = yf.download(tickers = ['AAPL', 'MSFT'], period = '2d')
可以在没有 yfinance 的情况下构建类似的数据框,例如:
import pandas as pd
pd.options.display.float_format = '{:.2f}'.format
import numpy as np
attributes = ['Adj Close', 'Close', 'High', 'Low', 'Open', 'Volume']
symbols = ['AAPL', 'MSFT']
dates = ['2020-07-23', '2020-07-24']
data = [[[371.38, 202.54], [371.38, 202.54], [388.31, 210.92], [368.04, 202.15], [387.99, 207.19], [49251100, 67457000]],
[[370.46, 201.30], [370.46, 201.30], [371.88, 202.86], [356.58, 197.51 ], [363.95, 200.42], [46323800, 39799500]]]
data = np.array(data).reshape(len(dates), len(symbols) * len(attributes))
cols = pd.MultiIndex.from_product([attributes, symbols])
df = pd.DataFrame(data, index=dates, columns=cols)
df
输出:
Adj Close Close High Low Open Volume
AAPL MSFT AAPL MSFT AAPL MSFT AAPL MSFT AAPL MSFT AAPL MSFT
2020-07-23 371.38 202.54 371.38 202.54 388.31 210.92 368.04 202.15 387.99 207.19 49251100.0 67457000.0
2020-07-24 370.46 201.30 370.46 201.30 371.88 202.86 356.58 197.51 363.95 200.42 46323800.0 39799500.0
有了这个数据框后,我想对其进行重组,以便每个符号和日期都有一行。我目前正在通过循环遍历符号列表并每次调用 API 一次并附加结果来执行此操作。我相信一定有更有效的方法:
df = pd.DataFrame()
symbols = ['AAPL', 'MSFT']
for x in range(0, len(symbols)):
symbol = symbols[x]
result = yf.download(tickers = symbol, start = '2020-07-23', end = '2020-07-25')
result.insert(0, 'symbol', symbol)
df = pd.concat([df, result])
所需输出示例:
df
symbol Open High Low Close Adj Close Volume
Date
2020-07-23 AAPL 387.989990 388.309998 368.040009 371.380005 371.380005 49251100
2020-07-24 AAPL 363.950012 371.880005 356.579987 370.459991 370.459991 46323800
2020-07-23 MSFT 207.190002 210.919998 202.149994 202.539993 202.539993 67457000
2020-07-24 MSFT 200.419998 202.860001 197.509995 201.300003 201.300003 39799500
这看起来像是一个简单的堆叠操作。让我们一起去
df = yf.download(tickers = ['AAPL', 'MSFT'], period = '2d') # Get your data
df.stack(level=1).rename_axis(['Date', 'symbol']).reset_index(level=1)
输出:
symbol Adj Close ... Open Volume
Date ...
2020-07-23 AAPL 371.380005 ... 387.989990 49251100
2020-07-23 MSFT 202.539993 ... 207.190002 67457000
2020-07-24 AAPL 370.459991 ... 363.950012 46323800
2020-07-24 MSFT 201.300003 ... 200.419998 39799500
[4 rows x 7 columns]
我正在使用 yfinance 下载多个交易品种的价格历史记录,其中 returns 具有多个索引的数据框。例如:
import yfinance as yf
df = yf.download(tickers = ['AAPL', 'MSFT'], period = '2d')
可以在没有 yfinance 的情况下构建类似的数据框,例如:
import pandas as pd
pd.options.display.float_format = '{:.2f}'.format
import numpy as np
attributes = ['Adj Close', 'Close', 'High', 'Low', 'Open', 'Volume']
symbols = ['AAPL', 'MSFT']
dates = ['2020-07-23', '2020-07-24']
data = [[[371.38, 202.54], [371.38, 202.54], [388.31, 210.92], [368.04, 202.15], [387.99, 207.19], [49251100, 67457000]],
[[370.46, 201.30], [370.46, 201.30], [371.88, 202.86], [356.58, 197.51 ], [363.95, 200.42], [46323800, 39799500]]]
data = np.array(data).reshape(len(dates), len(symbols) * len(attributes))
cols = pd.MultiIndex.from_product([attributes, symbols])
df = pd.DataFrame(data, index=dates, columns=cols)
df
输出:
Adj Close Close High Low Open Volume
AAPL MSFT AAPL MSFT AAPL MSFT AAPL MSFT AAPL MSFT AAPL MSFT
2020-07-23 371.38 202.54 371.38 202.54 388.31 210.92 368.04 202.15 387.99 207.19 49251100.0 67457000.0
2020-07-24 370.46 201.30 370.46 201.30 371.88 202.86 356.58 197.51 363.95 200.42 46323800.0 39799500.0
有了这个数据框后,我想对其进行重组,以便每个符号和日期都有一行。我目前正在通过循环遍历符号列表并每次调用 API 一次并附加结果来执行此操作。我相信一定有更有效的方法:
df = pd.DataFrame()
symbols = ['AAPL', 'MSFT']
for x in range(0, len(symbols)):
symbol = symbols[x]
result = yf.download(tickers = symbol, start = '2020-07-23', end = '2020-07-25')
result.insert(0, 'symbol', symbol)
df = pd.concat([df, result])
所需输出示例:
df
symbol Open High Low Close Adj Close Volume
Date
2020-07-23 AAPL 387.989990 388.309998 368.040009 371.380005 371.380005 49251100
2020-07-24 AAPL 363.950012 371.880005 356.579987 370.459991 370.459991 46323800
2020-07-23 MSFT 207.190002 210.919998 202.149994 202.539993 202.539993 67457000
2020-07-24 MSFT 200.419998 202.860001 197.509995 201.300003 201.300003 39799500
这看起来像是一个简单的堆叠操作。让我们一起去
df = yf.download(tickers = ['AAPL', 'MSFT'], period = '2d') # Get your data
df.stack(level=1).rename_axis(['Date', 'symbol']).reset_index(level=1)
输出:
symbol Adj Close ... Open Volume
Date ...
2020-07-23 AAPL 371.380005 ... 387.989990 49251100
2020-07-23 MSFT 202.539993 ... 207.190002 67457000
2020-07-24 AAPL 370.459991 ... 363.950012 46323800
2020-07-24 MSFT 201.300003 ... 200.419998 39799500
[4 rows x 7 columns]