如何在熊猫数据框中提前 returns 计算一天?

How can I compute one day ahead returns in a panda dataframe?

我一直在做一个项目,我试图计算一段时间内以百分比表示的对数 returns。 我已经将所有每日调整后的收盘价存储在一个熊猫数据框中,如下所示:

{'SP500': {Timestamp('2009-12-31 00:00:00'): 1115.0999755859375,
  Timestamp('2010-01-04 00:00:00'): 1132.989990234375,
  Timestamp('2010-01-05 00:00:00'): 1136.52001953125,
  Timestamp('2010-01-06 00:00:00'): 1137.1400146484375,
  Timestamp('2010-01-07 00:00:00'): 1141.68994140625},
 'A': {Timestamp('2009-12-31 00:00:00'): 20.28476333618164,
  Timestamp('2010-01-04 00:00:00'): 20.43492889404297,
  Timestamp('2010-01-05 00:00:00'): 20.21295928955078,
  Timestamp('2010-01-06 00:00:00'): 20.141132354736328,
  Timestamp('2010-01-07 00:00:00'): 20.11502456665039},
 'AAL': {Timestamp('2009-12-31 00:00:00'): 4.562869548797607,
  Timestamp('2010-01-04 00:00:00'): 4.496876239776611,
  Timestamp('2010-01-05 00:00:00'): 5.005957126617432,
  Timestamp('2010-01-06 00:00:00'): 4.79855489730835,
  Timestamp('2010-01-07 00:00:00'): 4.93996524810791},
 'AAP': {Timestamp('2009-12-31 00:00:00'): 38.3176383972168,
  Timestamp('2010-01-04 00:00:00'): 38.22296905517578,
  Timestamp('2010-01-05 00:00:00'): 37.99578857421875,
  Timestamp('2010-01-06 00:00:00'): 38.32709884643555,
  Timestamp('2010-01-07 00:00:00'): 38.3176383972168},
 'AAPL': {Timestamp('2009-12-31 00:00:00'): 6.471692085266113,
  Timestamp('2010-01-04 00:00:00'): 6.572423458099365,
  Timestamp('2010-01-05 00:00:00'): 6.583786487579346,
  Timestamp('2010-01-06 00:00:00'): 6.479064464569092,
  Timestamp('2010-01-07 00:00:00'): 6.467087268829346}}

我认为每天的 return 定义为:在第 t 天,return 将是第 t 天的 log return 减去 log return 第 t-1 天。我应用了这行代码:

for i in df.columns:
     df[i] = np.log(df[i]) - np.log(df[i].shift(1))

我已经检查过了,它给了我预期的结果是: rti = ln(AdjClosingPrice)t - ln(AdjClosingPrice)t -1 每列:

{'SP500': {Timestamp('2009-12-31 00:00:00'): nan,
  Timestamp('2010-01-04 00:00:00'): 0.015916082167126255,
  Timestamp('2010-01-05 00:00:00'): 0.003110831966759875,
  Timestamp('2010-01-06 00:00:00'): 0.0005453718878234426,
  Timestamp('2010-01-07 00:00:00'): 0.003993218354654715},
 'A': {Timestamp('2009-12-31 00:00:00'): nan,
  Timestamp('2010-01-04 00:00:00'): 0.007375607740701007,
  Timestamp('2010-01-05 00:00:00'): -0.010921689703742743,
  Timestamp('2010-01-06 00:00:00'): -0.003559837812704636,
  Timestamp('2010-01-07 00:00:00'): -0.001297083166547086},
 'AAL': {Timestamp('2009-12-31 00:00:00'): nan,
  Timestamp('2010-01-04 00:00:00'): -0.014568725834338547,
  Timestamp('2010-01-05 00:00:00'): 0.10724564178274565,
  Timestamp('2010-01-06 00:00:00'): -0.042313819049169865,
  Timestamp('2010-01-07 00:00:00'): 0.029043486854613887},
 'AAP': {Timestamp('2009-12-31 00:00:00'): nan,
  Timestamp('2010-01-04 00:00:00'): -0.0024737036578925675,
  Timestamp('2010-01-05 00:00:00'): -0.005961292490163306,
  Timestamp('2010-01-06 00:00:00'): 0.008681861089003373,
  Timestamp('2010-01-07 00:00:00'): -0.0002468649409474999},
 'AAPL': {Timestamp('2009-12-31 00:00:00'): nan,
  Timestamp('2010-01-04 00:00:00'): 0.015445029590123394,
  Timestamp('2010-01-05 00:00:00'): 0.0017274020941329127,
  Timestamp('2010-01-06 00:00:00'): -0.0160339066767059,
  Timestamp('2010-01-07 00:00:00'): -0.001850310332141225}}

我的问题分为两部分:

  1. 我怎样才能得到 rt+1i ?

会不会是rt+1i = ln(AdjClosingPrice)t+1 - ln(AdjClosingPrice)t ?

  1. 你知道我应该计算它的循环吗?

将结果列移到顶部,今天会有明天的结果,如果那是您所说的。在本例中,'c' 列被移动。 将 pandas 导入为 pd

df = pd.DataFrame({'a': [1, 2, 3, 4, 5], 'b': [1, 2, 3, 4, 5], 'c': [1, 2, 3, 4, 5]})
print(df)
df.c = df.c.shift(-1)
print(df)

输出打印(df)

   a  b  c
0  1  1  1
1  2  2  2
2  3  3  3
3  4  4  4
4  5  5  5

输出df.c = df.c.shift(-1)

   a  b    c
0  1  1  2.0
1  2  2  3.0
2  3  3  4.0
3  4  4  5.0
4  5  5  NaN

根据上面的答案和各种组合的一些测试,我找到了问题的答案:

for i in df.columns:
    df[i] = np.log(df[i].shift(-1)) - np.log(df[i])