从 pandas df 中的特定日期检索股票价格数据
retrieve stock price data from specific dates in pandas df
我有一个 pandas 数据框,其中包含某些股票的收益日期、每股收益实际值和估计值以及收入估计值和实际值。对于我的示例,我只有 10 个他们所有收益日期的代码,但我最终会合并所有纳斯达克代码。无论如何,通过 pandas 数据框、检索特定日期和符号并提取当天的股价(开盘价、高价、低价、收盘价)的最快方法是什么。我知道如何从雅虎财经 api 单独检索股票价格。 (即下载特定的代码并从开始日期和结束日期检索股票价格)但我不确定如何将两者联系起来。谢谢。
以下是我的示例 df 以及我希望看到的...
date symbol eps epsEstimated time revenue revenueEstimated
0 2022-01-27 CMCSA 0.77 0.73 bmo 3.033600e+10 3.046110e+10
1 2021-10-28 CMCSA 0.87 0.75 bmo 3.029800e+10 2.976570e+10
2 2021-07-29 CMCSA 0.84 0.67 bmo 2.854600e+10 2.717460e+10
3 2021-04-29 CMCSA 0.76 0.59 bmo 2.720500e+10 2.680920e+10
4 2021-01-28 CMCSA 0.56 0.48 bmo 2.770800e+10 2.309000e+10
.. ... ... ... ... ... ... ...
34 2013-07-24 FB 0.19 0.14 amc 1.813000e+09 1.335895e+09
35 2013-05-01 FB 0.12 0.13 amc 1.458000e+09 1.579500e+09
36 2013-01-30 FB 0.17 0.15 amc 1.585000e+09 1.398529e+09
37 2012-10-23 FB 0.12 0.11 amc 1.262000e+09 1.156833e+09
38 2012-07-26 FB 0.12 0.12 amc 1.184000e+09 1.184000e+09
我想要的结果(但在新列下有值):
date symbol eps epsEstimated revenue revenueEstimated Open High Low Clos
0 2022-01-27 CMCSA 0.77 0.73 .033600e+10 3.046110e+10
1 2021-10-28 CMCSA 0.87 0.75 3.029800e+10 2.976570e+10
2 2021-07-29 CMCSA 0.84 0.67 2.854600e+10 2.717460e+10
3 2021-04-29 CMCSA 0.76 0.59 2.720500e+10 2.680920e+10
4 2021-01-28 CMCSA 0.56 0.48 2.770800e+10 2.309000e+10
.. ... ... ... ... ... ... ...
34 2013-07-24 FB 0.19 0.14 1.813000e+09 1.335895e+09
35 2013-05-01 FB 0.12 0.13 1.458000e+09 1.579500e+09
36 2013-01-30 FB 0.17 0.15 1.585000e+09 1.398529e+09
37 2012-10-23 FB 0.12 0.11 1.262000e+09 1.156833e+09
38 2012-07-26 FB 0.12 0.12 1.184000e+09 1.184000e+09
更新 EDIT::This 是我目前所拥有的...
收入 df 称为 data1。我创建了三列 Day_0、Day_1 和 Day_0_close。在时间列中,值为 amc 或 bmo。 'amc' 表示开市后,'bmo' 表示开市前。为了让我分析收益对股价的反应。我可能需要重新调整日期,这就是我创建这些新专栏的原因。例如 bmo,由于收益是在当天开市前发布的,因此我需要知道昨天的日期及其收盘价 Day_0。对于 amc,我需要今天的日期和收盘价作为 Day_0_close。最终我需要获得第二天的价格,但暂时将其保持在 Day_0_close,直到我可以解决此问题。
date symbol eps epsEstimated time revenue revenueEstimated Day_0 Day_1 Day_0_Close
0 2022-01-27 CMCSA 0.770000 0.7300 bmo 3.033600e+10 3.046110e+10 0.0
1 2021-10-28 CMCSA 0.870000 0.7500 bmo 3.029800e+10 2.976570e+10 0.0
2 2021-07-29 CMCSA 0.840000 0.6700 bmo 2.854600e+10 2.717460e+10 0.0
3 2021-04-29 CMCSA 0.760000 0.5900 bmo 2.720500e+10 2.680920e+10 0.0
我有另一个名为 price1 的 df,它包含所有股票价格数据。
Date Open High ... Adj Close Volume ticker
0 1980-03-17 0.000000 0.101881 ... 0.070243 138396 CMCSA
1 1980-03-18 0.000000 0.101881 ... 0.070243 530518 CMCSA
2 1980-03-19 0.000000 0.100798 ... 0.069462 738113 CMCSA
3 1980-03-20 0.000000 0.108385 ... 0.074925 1360895 CMCSA
4 1980-03-21 0.000000 0.111636 ... 0.077267 461320 CMCSA
... ... ... ... ... ... ... ...
71942 2022-02-18 209.389999 210.750000 ... 206.160004 37049400 FB
71943 2022-02-22 202.339996 207.479996 ... 202.080002 39852400 FB
然后我创建了一个 for 循环来遍历 data1 中的每一行,以提取股票代码和日期并获取价格。但是现在我收到一个错误“IndexError:索引 0 超出了尺寸为 0 的轴 0 的范围”它在
处调试
day_0_close = price1.loc[(price1.ticker == symbol) & (price1.Date == date_0), 'Adj Close'].values[0].
我不知道为什么有时代码可以运行但在几行中停止时会出错。
见下文
date symbol eps epsEstimated time revenue revenueEstimated \
0 2022-01-27 CMCSA 0.77 0.73 bmo 3.033600e+10 3.046110e+10
1 2021-10-28 CMCSA 0.87 0.75 bmo 3.029800e+10 2.976570e+10
2 2021-07-29 CMCSA 0.84 0.67 bmo 2.854600e+10 2.717460e+10
3 2021-04-29 CMCSA 0.76 0.59 bmo 2.720500e+10 2.680920e+10
4 2021-01-28 CMCSA 0.56 0.48 bmo 2.770800e+10 2.309000e+10
Day_0 Day_1 Day_0_Close
0 2022-01-26 2022-01-27 48.459999
1 2021-10-27 2021-10-28 0.000000
2 0.000000
这是我到目前为止在 for 循环中的内容
for idx, row in data1.iterrows():
orig_day = pd.to_datetime(row['date'])
temp_day = orig_day + pd.tseries.offsets.CustomBusinessDay(1, holidays=nyse.holidays().holidays)
prev_temp_day = orig_day - pd.tseries.offsets.CustomBusinessDay(1, holidays=nyse.holidays().holidays)
if row['time'] == 'amc':
data1.at[idx, 'Day_0'] = orig_day.strftime("%Y-%m-%d")
data1.at[idx, 'Day_1'] = temp_day.strftime("%Y-%m-%d")
else:
data1.at[idx, 'Day_0'] = prev_temp_day.strftime("%Y-%m-%d")
data1.at[idx, 'Day_1'] = orig_day.strftime("%Y-%m-%d")
symbol = row['symbol']
date_0 = row['Day_0']
date_1 = row['Day_1']
day_0_close = price1.loc[(price1.ticker == symbol) & (price1.Date == date_0), 'Adj Close'].values[0]
print(day_0_close)
data1.at[idx, 'Day_0_Close'] = day_0_close
感谢您提供的任何帮助
此解决方案也涉及数据收集,请随意使用此功能或仅使用代码的特定部分调整数据合并。
首先,设置数据框来测试这个解决方案:
df = pd.DataFrame({'Date':['2022-01-27','2021-10-28','2021-07-29','2021-04-29','2021-01-28','2013-07-24','2013-05-01','2013-01-30','2012-10-23','2012-07-26'],
'symbol':['CMCSA','CMCSA','CMCSA','CMCSA','CMCSA','FB','FB','FB','FB','FB'],
'eps' :[0.77,0.87,0.84,0.76,0.56,0.19,0.12,0.17,0.12,0.12],
'epsEstimated' :[0.73,0.75,0.67,0.59,0.48,0.14,0.13,0.15,0.11,0.12],
'time' :['bmo','bmo','bmo','bmo','bmo','amc','amc','amc','amc','amc'],
'revenue' :[3.033600e+10,3.029800e+10,2.854600e+10,2.720500e+10,2.770800e+10,1.813000e+09,1.458000e+09,1.585000e+09,1.262000e+09,1.184000e+09],
'revenueEstimated':[3.046110e+10,3.046110e+10,2.717460e+10,2.680920e+10,2.309000e+10,1.335895e+09,1.579500e+09,1.398529e+09,1.156833e+09,1.184000e+09]})
df['Date'] = pd.to_datetime(df['Date'])
请注意,我将 Date
列命名为大写 D
。
df
Date symbol eps epsEstimated time revenue revenueEstimated
0 2022-01-27 CMCSA 0.77 0.73 bmo 3.033600e+10 3.046110e+10
1 2021-10-28 CMCSA 0.87 0.75 bmo 3.029800e+10 3.046110e+10
2 2021-07-29 CMCSA 0.84 0.67 bmo 2.854600e+10 2.717460e+10
3 2021-04-29 CMCSA 0.76 0.59 bmo 2.720500e+10 2.680920e+10
4 2021-01-28 CMCSA 0.56 0.48 bmo 2.770800e+10 2.309000e+10
5 2013-07-24 FB 0.19 0.14 amc 1.813000e+09 1.335895e+09
6 2013-05-01 FB 0.12 0.13 amc 1.458000e+09 1.579500e+09
7 2013-01-30 FB 0.17 0.15 amc 1.585000e+09 1.398529e+09
8 2012-10-23 FB 0.12 0.11 amc 1.262000e+09 1.156833e+09
9 2012-07-26 FB 0.12 0.12 amc 1.184000e+09 1.184000e+09
正在下载包含 OHLC 信息的数据库:
import yfinance as yf
df_ohlc = yf.download(df['symbol'].unique().tolist(), start=df['Date'].min())[['Open','High','Low','Close']]
df_ohlc
输出(无法使用文本正确格式化,因此出现图):
现在,我们堆叠符号级索引,重命名它并重置所有索引,我们希望symbol
和Date
索引都作为列,因此我们可以正确合并数据:
df_ohlc = df_ohlc.stack(level=1).reset_index().rename(columns={'level_1':'symbol'})
data1 = df.merge(df_ohlc, how='inner', on=['Date','symbol'])
输出:
data1
Date symbol eps epsEstimated time revenue revenueEstimated Close High Low Open
0 2022-01-27 CMCSA 0.77 0.73 bmo 3.033600e+10 3.046110e+10 48.009998 50.070000 45.470001 45.470001
1 2021-10-28 CMCSA 0.87 0.75 bmo 3.029800e+10 3.046110e+10 51.900002 52.740002 49.799999 50.400002
2 2021-07-29 CMCSA 0.84 0.67 bmo 2.854600e+10 2.717460e+10 58.110001 59.700001 58.060001 59.200001
3 2021-04-29 CMCSA 0.76 0.59 bmo 2.720500e+10 2.680920e+10 56.400002 56.490002 55.279999 55.980000
4 2021-01-28 CMCSA 0.56 0.48 bmo 2.770800e+10 2.309000e+10 51.599998 52.290001 49.779999 50.000000
5 2013-07-24 FB 0.19 0.14 amc 1.813000e+09 1.335895e+09 26.510000 26.530001 26.049999 26.320000
6 2013-05-01 FB 0.12 0.13 amc 1.458000e+09 1.579500e+09 27.430000 27.920000 27.309999 27.850000
7 2013-01-30 FB 0.17 0.15 amc 1.585000e+09 1.398529e+09 31.240000 31.490000 30.879999 30.980000
8 2012-10-23 FB 0.12 0.11 amc 1.262000e+09 1.156833e+09 19.500000 19.799999 19.100000 19.250000
9 2012-07-26 FB 0.12 0.12 amc 1.184000e+09 1.184000e+09 26.850000 28.230000 26.730000 27.750000
完成:我们得到了相应的 OHLC 值,避免了任何类型的循环。
我有一个 pandas 数据框,其中包含某些股票的收益日期、每股收益实际值和估计值以及收入估计值和实际值。对于我的示例,我只有 10 个他们所有收益日期的代码,但我最终会合并所有纳斯达克代码。无论如何,通过 pandas 数据框、检索特定日期和符号并提取当天的股价(开盘价、高价、低价、收盘价)的最快方法是什么。我知道如何从雅虎财经 api 单独检索股票价格。 (即下载特定的代码并从开始日期和结束日期检索股票价格)但我不确定如何将两者联系起来。谢谢。
以下是我的示例 df 以及我希望看到的...
date symbol eps epsEstimated time revenue revenueEstimated
0 2022-01-27 CMCSA 0.77 0.73 bmo 3.033600e+10 3.046110e+10
1 2021-10-28 CMCSA 0.87 0.75 bmo 3.029800e+10 2.976570e+10
2 2021-07-29 CMCSA 0.84 0.67 bmo 2.854600e+10 2.717460e+10
3 2021-04-29 CMCSA 0.76 0.59 bmo 2.720500e+10 2.680920e+10
4 2021-01-28 CMCSA 0.56 0.48 bmo 2.770800e+10 2.309000e+10
.. ... ... ... ... ... ... ...
34 2013-07-24 FB 0.19 0.14 amc 1.813000e+09 1.335895e+09
35 2013-05-01 FB 0.12 0.13 amc 1.458000e+09 1.579500e+09
36 2013-01-30 FB 0.17 0.15 amc 1.585000e+09 1.398529e+09
37 2012-10-23 FB 0.12 0.11 amc 1.262000e+09 1.156833e+09
38 2012-07-26 FB 0.12 0.12 amc 1.184000e+09 1.184000e+09
我想要的结果(但在新列下有值):
date symbol eps epsEstimated revenue revenueEstimated Open High Low Clos
0 2022-01-27 CMCSA 0.77 0.73 .033600e+10 3.046110e+10
1 2021-10-28 CMCSA 0.87 0.75 3.029800e+10 2.976570e+10
2 2021-07-29 CMCSA 0.84 0.67 2.854600e+10 2.717460e+10
3 2021-04-29 CMCSA 0.76 0.59 2.720500e+10 2.680920e+10
4 2021-01-28 CMCSA 0.56 0.48 2.770800e+10 2.309000e+10
.. ... ... ... ... ... ... ...
34 2013-07-24 FB 0.19 0.14 1.813000e+09 1.335895e+09
35 2013-05-01 FB 0.12 0.13 1.458000e+09 1.579500e+09
36 2013-01-30 FB 0.17 0.15 1.585000e+09 1.398529e+09
37 2012-10-23 FB 0.12 0.11 1.262000e+09 1.156833e+09
38 2012-07-26 FB 0.12 0.12 1.184000e+09 1.184000e+09
更新 EDIT::This 是我目前所拥有的...
收入 df 称为 data1。我创建了三列 Day_0、Day_1 和 Day_0_close。在时间列中,值为 amc 或 bmo。 'amc' 表示开市后,'bmo' 表示开市前。为了让我分析收益对股价的反应。我可能需要重新调整日期,这就是我创建这些新专栏的原因。例如 bmo,由于收益是在当天开市前发布的,因此我需要知道昨天的日期及其收盘价 Day_0。对于 amc,我需要今天的日期和收盘价作为 Day_0_close。最终我需要获得第二天的价格,但暂时将其保持在 Day_0_close,直到我可以解决此问题。
date symbol eps epsEstimated time revenue revenueEstimated Day_0 Day_1 Day_0_Close
0 2022-01-27 CMCSA 0.770000 0.7300 bmo 3.033600e+10 3.046110e+10 0.0
1 2021-10-28 CMCSA 0.870000 0.7500 bmo 3.029800e+10 2.976570e+10 0.0
2 2021-07-29 CMCSA 0.840000 0.6700 bmo 2.854600e+10 2.717460e+10 0.0
3 2021-04-29 CMCSA 0.760000 0.5900 bmo 2.720500e+10 2.680920e+10 0.0
我有另一个名为 price1 的 df,它包含所有股票价格数据。
Date Open High ... Adj Close Volume ticker
0 1980-03-17 0.000000 0.101881 ... 0.070243 138396 CMCSA
1 1980-03-18 0.000000 0.101881 ... 0.070243 530518 CMCSA
2 1980-03-19 0.000000 0.100798 ... 0.069462 738113 CMCSA
3 1980-03-20 0.000000 0.108385 ... 0.074925 1360895 CMCSA
4 1980-03-21 0.000000 0.111636 ... 0.077267 461320 CMCSA
... ... ... ... ... ... ... ...
71942 2022-02-18 209.389999 210.750000 ... 206.160004 37049400 FB
71943 2022-02-22 202.339996 207.479996 ... 202.080002 39852400 FB
然后我创建了一个 for 循环来遍历 data1 中的每一行,以提取股票代码和日期并获取价格。但是现在我收到一个错误“IndexError:索引 0 超出了尺寸为 0 的轴 0 的范围”它在
处调试day_0_close = price1.loc[(price1.ticker == symbol) & (price1.Date == date_0), 'Adj Close'].values[0].
我不知道为什么有时代码可以运行但在几行中停止时会出错。
见下文
date symbol eps epsEstimated time revenue revenueEstimated \
0 2022-01-27 CMCSA 0.77 0.73 bmo 3.033600e+10 3.046110e+10
1 2021-10-28 CMCSA 0.87 0.75 bmo 3.029800e+10 2.976570e+10
2 2021-07-29 CMCSA 0.84 0.67 bmo 2.854600e+10 2.717460e+10
3 2021-04-29 CMCSA 0.76 0.59 bmo 2.720500e+10 2.680920e+10
4 2021-01-28 CMCSA 0.56 0.48 bmo 2.770800e+10 2.309000e+10
Day_0 Day_1 Day_0_Close
0 2022-01-26 2022-01-27 48.459999
1 2021-10-27 2021-10-28 0.000000
2 0.000000
这是我到目前为止在 for 循环中的内容
for idx, row in data1.iterrows():
orig_day = pd.to_datetime(row['date'])
temp_day = orig_day + pd.tseries.offsets.CustomBusinessDay(1, holidays=nyse.holidays().holidays)
prev_temp_day = orig_day - pd.tseries.offsets.CustomBusinessDay(1, holidays=nyse.holidays().holidays)
if row['time'] == 'amc':
data1.at[idx, 'Day_0'] = orig_day.strftime("%Y-%m-%d")
data1.at[idx, 'Day_1'] = temp_day.strftime("%Y-%m-%d")
else:
data1.at[idx, 'Day_0'] = prev_temp_day.strftime("%Y-%m-%d")
data1.at[idx, 'Day_1'] = orig_day.strftime("%Y-%m-%d")
symbol = row['symbol']
date_0 = row['Day_0']
date_1 = row['Day_1']
day_0_close = price1.loc[(price1.ticker == symbol) & (price1.Date == date_0), 'Adj Close'].values[0]
print(day_0_close)
data1.at[idx, 'Day_0_Close'] = day_0_close
感谢您提供的任何帮助
此解决方案也涉及数据收集,请随意使用此功能或仅使用代码的特定部分调整数据合并。
首先,设置数据框来测试这个解决方案:
df = pd.DataFrame({'Date':['2022-01-27','2021-10-28','2021-07-29','2021-04-29','2021-01-28','2013-07-24','2013-05-01','2013-01-30','2012-10-23','2012-07-26'],
'symbol':['CMCSA','CMCSA','CMCSA','CMCSA','CMCSA','FB','FB','FB','FB','FB'],
'eps' :[0.77,0.87,0.84,0.76,0.56,0.19,0.12,0.17,0.12,0.12],
'epsEstimated' :[0.73,0.75,0.67,0.59,0.48,0.14,0.13,0.15,0.11,0.12],
'time' :['bmo','bmo','bmo','bmo','bmo','amc','amc','amc','amc','amc'],
'revenue' :[3.033600e+10,3.029800e+10,2.854600e+10,2.720500e+10,2.770800e+10,1.813000e+09,1.458000e+09,1.585000e+09,1.262000e+09,1.184000e+09],
'revenueEstimated':[3.046110e+10,3.046110e+10,2.717460e+10,2.680920e+10,2.309000e+10,1.335895e+09,1.579500e+09,1.398529e+09,1.156833e+09,1.184000e+09]})
df['Date'] = pd.to_datetime(df['Date'])
请注意,我将 Date
列命名为大写 D
。
df
Date symbol eps epsEstimated time revenue revenueEstimated
0 2022-01-27 CMCSA 0.77 0.73 bmo 3.033600e+10 3.046110e+10
1 2021-10-28 CMCSA 0.87 0.75 bmo 3.029800e+10 3.046110e+10
2 2021-07-29 CMCSA 0.84 0.67 bmo 2.854600e+10 2.717460e+10
3 2021-04-29 CMCSA 0.76 0.59 bmo 2.720500e+10 2.680920e+10
4 2021-01-28 CMCSA 0.56 0.48 bmo 2.770800e+10 2.309000e+10
5 2013-07-24 FB 0.19 0.14 amc 1.813000e+09 1.335895e+09
6 2013-05-01 FB 0.12 0.13 amc 1.458000e+09 1.579500e+09
7 2013-01-30 FB 0.17 0.15 amc 1.585000e+09 1.398529e+09
8 2012-10-23 FB 0.12 0.11 amc 1.262000e+09 1.156833e+09
9 2012-07-26 FB 0.12 0.12 amc 1.184000e+09 1.184000e+09
正在下载包含 OHLC 信息的数据库:
import yfinance as yf
df_ohlc = yf.download(df['symbol'].unique().tolist(), start=df['Date'].min())[['Open','High','Low','Close']]
df_ohlc
输出(无法使用文本正确格式化,因此出现图):
现在,我们堆叠符号级索引,重命名它并重置所有索引,我们希望symbol
和Date
索引都作为列,因此我们可以正确合并数据:
df_ohlc = df_ohlc.stack(level=1).reset_index().rename(columns={'level_1':'symbol'})
data1 = df.merge(df_ohlc, how='inner', on=['Date','symbol'])
输出:
data1
Date symbol eps epsEstimated time revenue revenueEstimated Close High Low Open
0 2022-01-27 CMCSA 0.77 0.73 bmo 3.033600e+10 3.046110e+10 48.009998 50.070000 45.470001 45.470001
1 2021-10-28 CMCSA 0.87 0.75 bmo 3.029800e+10 3.046110e+10 51.900002 52.740002 49.799999 50.400002
2 2021-07-29 CMCSA 0.84 0.67 bmo 2.854600e+10 2.717460e+10 58.110001 59.700001 58.060001 59.200001
3 2021-04-29 CMCSA 0.76 0.59 bmo 2.720500e+10 2.680920e+10 56.400002 56.490002 55.279999 55.980000
4 2021-01-28 CMCSA 0.56 0.48 bmo 2.770800e+10 2.309000e+10 51.599998 52.290001 49.779999 50.000000
5 2013-07-24 FB 0.19 0.14 amc 1.813000e+09 1.335895e+09 26.510000 26.530001 26.049999 26.320000
6 2013-05-01 FB 0.12 0.13 amc 1.458000e+09 1.579500e+09 27.430000 27.920000 27.309999 27.850000
7 2013-01-30 FB 0.17 0.15 amc 1.585000e+09 1.398529e+09 31.240000 31.490000 30.879999 30.980000
8 2012-10-23 FB 0.12 0.11 amc 1.262000e+09 1.156833e+09 19.500000 19.799999 19.100000 19.250000
9 2012-07-26 FB 0.12 0.12 amc 1.184000e+09 1.184000e+09 26.850000 28.230000 26.730000 27.750000
完成:我们得到了相应的 OHLC 值,避免了任何类型的循环。