从 pandas df 中的特定日期检索股票价格数据

retrieve stock price data from specific dates in pandas df

我有一个 pandas 数据框,其中包含某些股票的收益日期、每股收益实际值和估计值以及收入估计值和实际值。对于我的示例,我只有 10 个他们所有收益日期的代码,但我最终会合并所有纳斯达克代码。无论如何,通过 pandas 数据框、检索特定日期和符号并提取当天的股价(开盘价、高价、低价、收盘价)的最快方法是什么。我知道如何从雅虎财经 api 单独检索股票价格。 (即下载特定的代码并从开始日期和结束日期检索股票价格)但我不确定如何将两者联系起来。谢谢。

以下是我的示例 df 以及我希望看到的...

          date symbol   eps  epsEstimated time       revenue  revenueEstimated
0   2022-01-27  CMCSA  0.77          0.73  bmo  3.033600e+10      3.046110e+10
1   2021-10-28  CMCSA  0.87          0.75  bmo  3.029800e+10      2.976570e+10
2   2021-07-29  CMCSA  0.84          0.67  bmo  2.854600e+10      2.717460e+10
3   2021-04-29  CMCSA  0.76          0.59  bmo  2.720500e+10      2.680920e+10
4   2021-01-28  CMCSA  0.56          0.48  bmo  2.770800e+10      2.309000e+10
..         ...    ...   ...           ...  ...           ...               ...
34  2013-07-24     FB  0.19          0.14  amc  1.813000e+09      1.335895e+09
35  2013-05-01     FB  0.12          0.13  amc  1.458000e+09      1.579500e+09
36  2013-01-30     FB  0.17          0.15  amc  1.585000e+09      1.398529e+09
37  2012-10-23     FB  0.12          0.11  amc  1.262000e+09      1.156833e+09
38  2012-07-26     FB  0.12          0.12  amc  1.184000e+09      1.184000e+09

我想要的结果(但在新列下有值):

          date symbol   eps  epsEstimated  revenue       revenueEstimated Open High Low Clos 
0   2022-01-27  CMCSA  0.77          0.73  .033600e+10      3.046110e+10     
1   2021-10-28  CMCSA  0.87          0.75  3.029800e+10      2.976570e+10
2   2021-07-29  CMCSA  0.84          0.67  2.854600e+10      2.717460e+10
3   2021-04-29  CMCSA  0.76          0.59  2.720500e+10      2.680920e+10
4   2021-01-28  CMCSA  0.56          0.48  2.770800e+10      2.309000e+10
..         ...    ...   ...           ...  ...           ...               ...
34  2013-07-24     FB  0.19          0.14  1.813000e+09      1.335895e+09
35  2013-05-01     FB  0.12          0.13  1.458000e+09      1.579500e+09
36  2013-01-30     FB  0.17          0.15  1.585000e+09      1.398529e+09
37  2012-10-23     FB  0.12          0.11  1.262000e+09      1.156833e+09
38  2012-07-26     FB  0.12          0.12  1.184000e+09      1.184000e+09


更新 EDIT::This 是我目前所拥有的...

收入 df 称为 data1。我创建了三列 Day_0、Day_1 和 Day_0_close。在时间列中,值为 amc 或 bmo。 'amc' 表示开市后,'bmo' 表示开市前。为了让我分析收益对股价的反应。我可能需要重新调整日期,这就是我创建这些新专栏的原因。例如 bmo,由于收益是在当天开市前发布的,因此我需要知道昨天的日期及其收盘价 Day_0。对于 amc,我需要今天的日期和收盘价作为 Day_0_close。最终我需要获得第二天的价格,但暂时将其保持在 Day_0_close,直到我可以解决此问题。


         date symbol        eps  epsEstimated time       revenue  revenueEstimated Day_0 Day_1  Day_0_Close
0  2022-01-27  CMCSA   0.770000        0.7300  bmo  3.033600e+10      3.046110e+10                      0.0
1  2021-10-28  CMCSA   0.870000        0.7500  bmo  3.029800e+10      2.976570e+10                      0.0
2  2021-07-29  CMCSA   0.840000        0.6700  bmo  2.854600e+10      2.717460e+10                      0.0
3  2021-04-29  CMCSA   0.760000        0.5900  bmo  2.720500e+10      2.680920e+10                      0.0

我有另一个名为 price1 的 df,它包含所有股票价格数据。

            Date        Open        High  ...   Adj Close    Volume  ticker
0     1980-03-17    0.000000    0.101881  ...    0.070243    138396   CMCSA
1     1980-03-18    0.000000    0.101881  ...    0.070243    530518   CMCSA
2     1980-03-19    0.000000    0.100798  ...    0.069462    738113   CMCSA
3     1980-03-20    0.000000    0.108385  ...    0.074925   1360895   CMCSA
4     1980-03-21    0.000000    0.111636  ...    0.077267    461320   CMCSA
...          ...         ...         ...  ...         ...       ...     ...
71942 2022-02-18  209.389999  210.750000  ...  206.160004  37049400      FB
71943 2022-02-22  202.339996  207.479996  ...  202.080002  39852400      FB

然后我创建了一个 for 循环来遍历 data1 中的每一行,以提取股票代码和日期并获取价格。但是现在我收到一个错误“IndexError:索引 0 超出了尺寸为 0 的轴 0 的范围”它在

处调试
day_0_close = price1.loc[(price1.ticker == symbol) & (price1.Date == date_0), 'Adj Close'].values[0]. 

我不知道为什么有时代码可以运行但在几行中停止时会出错。

见下文

        date symbol   eps  epsEstimated time       revenue  revenueEstimated  \
0 2022-01-27  CMCSA  0.77          0.73  bmo  3.033600e+10      3.046110e+10   
1 2021-10-28  CMCSA  0.87          0.75  bmo  3.029800e+10      2.976570e+10   
2 2021-07-29  CMCSA  0.84          0.67  bmo  2.854600e+10      2.717460e+10   
3 2021-04-29  CMCSA  0.76          0.59  bmo  2.720500e+10      2.680920e+10   
4 2021-01-28  CMCSA  0.56          0.48  bmo  2.770800e+10      2.309000e+10   

        Day_0       Day_1  Day_0_Close  
0  2022-01-26  2022-01-27    48.459999  
1  2021-10-27  2021-10-28     0.000000  
2                             0.000000

这是我到目前为止在 for 循环中的内容

for idx, row in data1.iterrows():

    orig_day = pd.to_datetime(row['date'])


    temp_day = orig_day + pd.tseries.offsets.CustomBusinessDay(1, holidays=nyse.holidays().holidays)
    prev_temp_day = orig_day - pd.tseries.offsets.CustomBusinessDay(1, holidays=nyse.holidays().holidays)

    if row['time'] == 'amc':
        data1.at[idx, 'Day_0'] = orig_day.strftime("%Y-%m-%d")
        data1.at[idx, 'Day_1'] = temp_day.strftime("%Y-%m-%d")
    else:
        data1.at[idx, 'Day_0'] = prev_temp_day.strftime("%Y-%m-%d")
        data1.at[idx, 'Day_1'] = orig_day.strftime("%Y-%m-%d")


    symbol = row['symbol']

    date_0 = row['Day_0']
    date_1 = row['Day_1']

    day_0_close = price1.loc[(price1.ticker == symbol) & (price1.Date == date_0), 'Adj Close'].values[0]
    print(day_0_close)

    data1.at[idx, 'Day_0_Close'] = day_0_close

感谢您提供的任何帮助

此解决方案也涉及数据收集,请随意使用此功能或仅使用代码的特定部分调整数据合并。

首先,设置数据框来测试这个解决方案:

df = pd.DataFrame({'Date':['2022-01-27','2021-10-28','2021-07-29','2021-04-29','2021-01-28','2013-07-24','2013-05-01','2013-01-30','2012-10-23','2012-07-26'],
                   'symbol':['CMCSA','CMCSA','CMCSA','CMCSA','CMCSA','FB','FB','FB','FB','FB'],
                   'eps'             :[0.77,0.87,0.84,0.76,0.56,0.19,0.12,0.17,0.12,0.12],
                   'epsEstimated'    :[0.73,0.75,0.67,0.59,0.48,0.14,0.13,0.15,0.11,0.12],
                   'time'            :['bmo','bmo','bmo','bmo','bmo','amc','amc','amc','amc','amc'],
                   'revenue'         :[3.033600e+10,3.029800e+10,2.854600e+10,2.720500e+10,2.770800e+10,1.813000e+09,1.458000e+09,1.585000e+09,1.262000e+09,1.184000e+09],
                   'revenueEstimated':[3.046110e+10,3.046110e+10,2.717460e+10,2.680920e+10,2.309000e+10,1.335895e+09,1.579500e+09,1.398529e+09,1.156833e+09,1.184000e+09]})
df['Date'] = pd.to_datetime(df['Date'])

请注意,我将 Date 列命名为大写 D

df
          Date  symbol   eps    epsEstimated    time     revenue revenueEstimated
0   2022-01-27   CMCSA  0.77            0.73    bmo 3.033600e+10     3.046110e+10
1   2021-10-28   CMCSA  0.87            0.75    bmo 3.029800e+10     3.046110e+10
2   2021-07-29   CMCSA  0.84            0.67    bmo 2.854600e+10     2.717460e+10
3   2021-04-29   CMCSA  0.76            0.59    bmo 2.720500e+10     2.680920e+10
4   2021-01-28   CMCSA  0.56            0.48    bmo 2.770800e+10     2.309000e+10
5   2013-07-24      FB  0.19            0.14    amc 1.813000e+09     1.335895e+09
6   2013-05-01      FB  0.12            0.13    amc 1.458000e+09     1.579500e+09
7   2013-01-30      FB  0.17            0.15    amc 1.585000e+09     1.398529e+09
8   2012-10-23      FB  0.12            0.11    amc 1.262000e+09     1.156833e+09
9   2012-07-26      FB  0.12            0.12    amc 1.184000e+09     1.184000e+09

正在下载包含 OHLC 信息的数据库:

import yfinance as yf

df_ohlc = yf.download(df['symbol'].unique().tolist(), start=df['Date'].min())[['Open','High','Low','Close']]
df_ohlc

输出(无法使用文本正确格式化,因此出现图):

现在,我们堆叠符号级索引,重命名它并重置所有索引,我们希望symbolDate索引都作为列,因此我们可以正确合并数据:

df_ohlc = df_ohlc.stack(level=1).reset_index().rename(columns={'level_1':'symbol'})
data1 = df.merge(df_ohlc, how='inner', on=['Date','symbol'])

输出:

data1

          Date symbol    eps epsEstimated   time         revenue    revenueEstimated    Close        High         Low        Open
0   2022-01-27  CMCSA   0.77    0.73         bmo    3.033600e+10    3.046110e+10    48.009998   50.070000   45.470001   45.470001
1   2021-10-28  CMCSA   0.87    0.75         bmo    3.029800e+10    3.046110e+10    51.900002   52.740002   49.799999   50.400002
2   2021-07-29  CMCSA   0.84    0.67         bmo    2.854600e+10    2.717460e+10    58.110001   59.700001   58.060001   59.200001
3   2021-04-29  CMCSA   0.76    0.59         bmo    2.720500e+10    2.680920e+10    56.400002   56.490002   55.279999   55.980000
4   2021-01-28  CMCSA   0.56    0.48         bmo    2.770800e+10    2.309000e+10    51.599998   52.290001   49.779999   50.000000
5   2013-07-24     FB   0.19    0.14         amc    1.813000e+09    1.335895e+09    26.510000   26.530001   26.049999   26.320000
6   2013-05-01     FB   0.12    0.13         amc    1.458000e+09    1.579500e+09    27.430000   27.920000   27.309999   27.850000
7   2013-01-30     FB   0.17    0.15         amc    1.585000e+09    1.398529e+09    31.240000   31.490000   30.879999   30.980000
8   2012-10-23     FB   0.12    0.11         amc    1.262000e+09    1.156833e+09    19.500000   19.799999   19.100000   19.250000
9   2012-07-26     FB   0.12    0.12         amc    1.184000e+09    1.184000e+09    26.850000   28.230000   26.730000   27.750000

完成:我们得到了相应的 OHLC 值,避免了任何类型的循环。