Pandas 和财务计算

Pandas and finance calculations

我对使用 Python 还很陌生,我正在编写一个股票分析脚本。 这个想法是脚本最终会接受一个股票代码,然后脚本会计算夏普比率、特雷诺比率和其他财务信息。 现在,我无法让 Pandas 正常工作。我无法仅访问 DataFrame 中的一列来计算股票的收益率。

from pandas.io.data import DataReader
from datetime import date, timedelta



def calc_yield(now, old):
    return (now-old)/old


def yield_array(cl):
    array = []
    count = 0
    for i in cl:
        old = cl[count]
        count += 1
        new = cl[count]
        array.append(calc_yield(new, old))
    return array


market = '^GSPC'
ticker = "AAPL"
days = 10

# set start and end dates
edate = date.today() - timedelta(days=1)
sdate = edate - timedelta(days=days)

# Read the stock price data from Yahoo
data = DataReader(ticker, 'yahoo', start=sdate, end=edate)

close = data['Adj Close']


print yield_array(close)

错误:

/Users/Tim/anaconda/bin/python "/Users/Tim/PycharmProjects/Test2/module tests.py"
Traceback (most recent call last):
  File "/Users/Tim/PycharmProjects/Test2/module tests.py", line 35, in <module>
    print yield_array(close)
  File "/Users/Tim/PycharmProjects/Test2/module tests.py", line 16, in yield_array
    new = cl[count]
  File "/Users/Tim/anaconda/lib/python2.7/site-packages/pandas/core/series.py", line 484, in __getitem__
    result = self.index.get_value(self, key)
  File "/Users/Tim/anaconda/lib/python2.7/site-packages/pandas/tseries/index.py", line 1243, in get_value
    return _maybe_box(self, Index.get_value(self, series, key), series, key)
  File "/Users/Tim/anaconda/lib/python2.7/site-packages/pandas/core/index.py", line 1202, in get_value
    return tslib.get_value_box(s, key)
  File "tslib.pyx", line 540, in pandas.tslib.get_value_box (pandas/tslib.c:11833)
  File "tslib.pyx", line 555, in pandas.tslib.get_value_box (pandas/tslib.c:11680)
IndexError: index out of bounds

Process finished with exit code 1

我想我明白了你的问题。鉴于此功能:

def yield_array(cl):
    array = []
    count = 0
    for i in cl:
        old = cl[count]
        count += 1
        print count
        new = cl[count]
        array.append(calc_yield(new, old))
        print old
        print new
    return array

问题在于,在 cl 的最后一项上,您将 count 加 1,这将导致索引大于 cl 的最大索引。这会导致错误,因为它试图访问不存在的索引。您需要执行类似 for i in cl[:-1] 的操作,这将跳过最后一个元素。

但是,有一种更简单的方法可以通过向量化来实现。您可以将整个函数缩减为:

close = data['Adj Close']
yield_data = close.diff()/close.shift(1)

或者更好的是,您可以将结果放回 DataFrame 以备后用:

close = data['Adj Close']
data['Yield'] = close.diff()/close.shift(1)