Pandas 和财务计算
Pandas and finance calculations
我对使用 Python 还很陌生,我正在编写一个股票分析脚本。
这个想法是脚本最终会接受一个股票代码,然后脚本会计算夏普比率、特雷诺比率和其他财务信息。
现在,我无法让 Pandas 正常工作。我无法仅访问 DataFrame 中的一列来计算股票的收益率。
from pandas.io.data import DataReader
from datetime import date, timedelta
def calc_yield(now, old):
return (now-old)/old
def yield_array(cl):
array = []
count = 0
for i in cl:
old = cl[count]
count += 1
new = cl[count]
array.append(calc_yield(new, old))
return array
market = '^GSPC'
ticker = "AAPL"
days = 10
# set start and end dates
edate = date.today() - timedelta(days=1)
sdate = edate - timedelta(days=days)
# Read the stock price data from Yahoo
data = DataReader(ticker, 'yahoo', start=sdate, end=edate)
close = data['Adj Close']
print yield_array(close)
错误:
/Users/Tim/anaconda/bin/python "/Users/Tim/PycharmProjects/Test2/module tests.py"
Traceback (most recent call last):
File "/Users/Tim/PycharmProjects/Test2/module tests.py", line 35, in <module>
print yield_array(close)
File "/Users/Tim/PycharmProjects/Test2/module tests.py", line 16, in yield_array
new = cl[count]
File "/Users/Tim/anaconda/lib/python2.7/site-packages/pandas/core/series.py", line 484, in __getitem__
result = self.index.get_value(self, key)
File "/Users/Tim/anaconda/lib/python2.7/site-packages/pandas/tseries/index.py", line 1243, in get_value
return _maybe_box(self, Index.get_value(self, series, key), series, key)
File "/Users/Tim/anaconda/lib/python2.7/site-packages/pandas/core/index.py", line 1202, in get_value
return tslib.get_value_box(s, key)
File "tslib.pyx", line 540, in pandas.tslib.get_value_box (pandas/tslib.c:11833)
File "tslib.pyx", line 555, in pandas.tslib.get_value_box (pandas/tslib.c:11680)
IndexError: index out of bounds
Process finished with exit code 1
我想我明白了你的问题。鉴于此功能:
def yield_array(cl):
array = []
count = 0
for i in cl:
old = cl[count]
count += 1
print count
new = cl[count]
array.append(calc_yield(new, old))
print old
print new
return array
问题在于,在 cl
的最后一项上,您将 count
加 1,这将导致索引大于 cl
的最大索引。这会导致错误,因为它试图访问不存在的索引。您需要执行类似 for i in cl[:-1]
的操作,这将跳过最后一个元素。
但是,有一种更简单的方法可以通过向量化来实现。您可以将整个函数缩减为:
close = data['Adj Close']
yield_data = close.diff()/close.shift(1)
或者更好的是,您可以将结果放回 DataFrame
以备后用:
close = data['Adj Close']
data['Yield'] = close.diff()/close.shift(1)
我对使用 Python 还很陌生,我正在编写一个股票分析脚本。 这个想法是脚本最终会接受一个股票代码,然后脚本会计算夏普比率、特雷诺比率和其他财务信息。 现在,我无法让 Pandas 正常工作。我无法仅访问 DataFrame 中的一列来计算股票的收益率。
from pandas.io.data import DataReader
from datetime import date, timedelta
def calc_yield(now, old):
return (now-old)/old
def yield_array(cl):
array = []
count = 0
for i in cl:
old = cl[count]
count += 1
new = cl[count]
array.append(calc_yield(new, old))
return array
market = '^GSPC'
ticker = "AAPL"
days = 10
# set start and end dates
edate = date.today() - timedelta(days=1)
sdate = edate - timedelta(days=days)
# Read the stock price data from Yahoo
data = DataReader(ticker, 'yahoo', start=sdate, end=edate)
close = data['Adj Close']
print yield_array(close)
错误:
/Users/Tim/anaconda/bin/python "/Users/Tim/PycharmProjects/Test2/module tests.py"
Traceback (most recent call last):
File "/Users/Tim/PycharmProjects/Test2/module tests.py", line 35, in <module>
print yield_array(close)
File "/Users/Tim/PycharmProjects/Test2/module tests.py", line 16, in yield_array
new = cl[count]
File "/Users/Tim/anaconda/lib/python2.7/site-packages/pandas/core/series.py", line 484, in __getitem__
result = self.index.get_value(self, key)
File "/Users/Tim/anaconda/lib/python2.7/site-packages/pandas/tseries/index.py", line 1243, in get_value
return _maybe_box(self, Index.get_value(self, series, key), series, key)
File "/Users/Tim/anaconda/lib/python2.7/site-packages/pandas/core/index.py", line 1202, in get_value
return tslib.get_value_box(s, key)
File "tslib.pyx", line 540, in pandas.tslib.get_value_box (pandas/tslib.c:11833)
File "tslib.pyx", line 555, in pandas.tslib.get_value_box (pandas/tslib.c:11680)
IndexError: index out of bounds
Process finished with exit code 1
我想我明白了你的问题。鉴于此功能:
def yield_array(cl):
array = []
count = 0
for i in cl:
old = cl[count]
count += 1
print count
new = cl[count]
array.append(calc_yield(new, old))
print old
print new
return array
问题在于,在 cl
的最后一项上,您将 count
加 1,这将导致索引大于 cl
的最大索引。这会导致错误,因为它试图访问不存在的索引。您需要执行类似 for i in cl[:-1]
的操作,这将跳过最后一个元素。
但是,有一种更简单的方法可以通过向量化来实现。您可以将整个函数缩减为:
close = data['Adj Close']
yield_data = close.diff()/close.shift(1)
或者更好的是,您可以将结果放回 DataFrame
以备后用:
close = data['Adj Close']
data['Yield'] = close.diff()/close.shift(1)