Zipline:使用 pandas-datareader 为非美国金融市场输入 Google 金融数据框
Zipline: using pandas-datareader to feed in Google Finance dataframe for non-US based financial markets
请注意:此问题已在下面的 ptrj 中成功回答。我还在我的博客上写了一篇博客 post,介绍我使用 zipline 的经历,您可以在这里找到它:https://financialzipline.wordpress.com
我在南非工作,我正在尝试将南非股票加载到数据框中,以便它可以为 zipline 提供股价信息。假设我正在查看 JSE(约翰内斯堡证券交易所)上市的 AdCorp Holdings Limited:
Google财经给我历史价格信息:
https://www.google.com/finance/historical?q=JSE%3AADR&ei=5G6OV4ibBIi8UcP-nfgB
雅虎财经没有关于该公司的信息。
https://finance.yahoo.com/quote/adcorp?ltr=1
在 iPython Notebook 中输入以下代码,获取来自 Google Finance 的信息的数据框:
start = datetime.datetime(2016,7,1)
end = datetime.datetime(2016,7,18)
f = web.DataReader('JSE:ADR', 'google',start,end)
如果我显示 f,我看到该信息实际上对应于关闭 Google 财务的信息:
这个价格正好低于 Google 金融,你可以在 Google 金融网站上看到 2016-07-18 的信息与我的数据框完全匹配。
但是,我不确定如何加载此数据框,以便 zipline 可以将其用作数据包。
如果您查看为 buyapple.py
给出的示例,您会发现它只是从摄取的数据包 quantopian-quandl
中提取 apple shares (APPL) 的数据。这里的挑战是将 APPL
替换为 JSE:ADR
以便它将每天从数据框而不是数据包 quantopian-quandl
中订购 10 JSE:ADR
股并将其绘制在一张图。
有人知道怎么做吗?
网上几乎没有处理这个的例子...
这是 zipline 示例文件夹中提供的 buyapple.py
代码:
from zipline.api import order, record, symbol
def initialize(context):
pass
def handle_data(context, data):
order(symbol('AAPL'), 10)
record(AAPL=data.current(symbol('AAPL'), 'price'))
# Note: this function can be removed if running
# this algorithm on quantopian.com
def analyze(context=None, results=None):
import matplotlib.pyplot as plt
# Plot the portfolio and asset data.
ax1 = plt.subplot(211)
results.portfolio_value.plot(ax=ax1)
ax1.set_ylabel('Portfolio value (USD)')
ax2 = plt.subplot(212, sharex=ax1)
results.AAPL.plot(ax=ax2)
ax2.set_ylabel('AAPL price (USD)')
# Show the plot.
plt.gcf().set_size_inches(18, 8)
plt.show()
def _test_args():
"""Extra arguments to use when zipline's automated tests run this example.
"""
import pandas as pd
return {
'start': pd.Timestamp('2014-01-01', tz='utc'),
'end': pd.Timestamp('2014-11-01', tz='utc'),
}
编辑:
我查看了从 Yahoo Finance 获取数据的代码,并对其进行了一些修改,使其能够接收 Google Finance 数据。可以在此处找到 Yahoo Finance 的代码:http://www.zipline.io/_modules/zipline/data/bundles/yahoo.html.
这是我提取 Google 财务的代码 - 遗憾的是它不起作用。有人可以更流利地 python 帮助我吗?:
import os
import numpy as np
import pandas as pd
from pandas_datareader.data import DataReader
import requests
from zipline.utils.cli import maybe_show_progress
def _cachpath(symbol, type_):
return '-'.join((symbol.replace(os.path.sep, '_'), type_))
def google_equities(symbols, start=None, end=None):
"""Create a data bundle ingest function from a set of symbols loaded from
yahoo.
Parameters
----------
symbols : iterable[str]
The ticker symbols to load data for.
start : datetime, optional
The start date to query for. By default this pulls the full history
for the calendar.
end : datetime, optional
The end date to query for. By default this pulls the full history
for the calendar.
Returns
-------
ingest : callable
The bundle ingest function for the given set of symbols.
Examples
--------
This code should be added to ~/.zipline/extension.py
.. code-block:: python
from zipline.data.bundles import yahoo_equities, register
symbols = (
'AAPL',
'IBM',
'MSFT',
)
register('my_bundle', yahoo_equities(symbols))
Notes
-----
The sids for each symbol will be the index into the symbols sequence.
"""
# strict this in memory so that we can reiterate over it
symbols = tuple(symbols)
def ingest(environ,
asset_db_writer,
minute_bar_writer, # unused
daily_bar_writer,
adjustment_writer,
calendar,
cache,
show_progress,
output_dir,
# pass these as defaults to make them 'nonlocal' in py2
start=start,
end=end):
if start is None:
start = calendar[0]
if end is None:
end = None
metadata = pd.DataFrame(np.empty(len(symbols), dtype=[
('start_date', 'datetime64[ns]'),
('end_date', 'datetime64[ns]'),
('auto_close_date', 'datetime64[ns]'),
('symbol', 'object'),
]))
def _pricing_iter():
sid = 0
with maybe_show_progress(
symbols,
show_progress,
label='Downloading Google pricing data: ') as it, \
requests.Session() as session:
for symbol in it:
path = _cachpath(symbol, 'ohlcv')
try:
df = cache[path]
except KeyError:
df = cache[path] = DataReader(
symbol,
'google',
start,
end,
session=session,
).sort_index()
# the start date is the date of the first trade and
# the end date is the date of the last trade
start_date = df.index[0]
end_date = df.index[-1]
# The auto_close date is the day after the last trade.
ac_date = end_date + pd.Timedelta(days=1)
metadata.iloc[sid] = start_date, end_date, ac_date, symbol
df.rename(
columns={
'Open': 'open',
'High': 'high',
'Low': 'low',
'Close': 'close',
'Volume': 'volume',
},
inplace=True,
)
yield sid, df
sid += 1
daily_bar_writer.write(_pricing_iter(), show_progress=True)
symbol_map = pd.Series(metadata.symbol.index, metadata.symbol)
asset_db_writer.write(equities=metadata)
adjustment_writer.write(splits=pd.DataFrame(), dividends=pd.DataFrame())
# adjustments = []
# with maybe_show_progress(
# symbols,
# show_progress,
# label='Downloading Google adjustment data: ') as it, \
# requests.Session() as session:
# for symbol in it:
# path = _cachpath(symbol, 'adjustment')
# try:
# df = cache[path]
# except KeyError:
# df = cache[path] = DataReader(
# symbol,
# 'google-actions',
# start,
# end,
# session=session,
# ).sort_index()
# df['sid'] = symbol_map[symbol]
# adjustments.append(df)
# adj_df = pd.concat(adjustments)
# adj_df.index.name = 'date'
# adj_df.reset_index(inplace=True)
# splits = adj_df[adj_df.action == 'SPLIT']
# splits = splits.rename(
# columns={'value': 'ratio', 'date': 'effective_date'},
# )
# splits.drop('action', axis=1, inplace=True)
# dividends = adj_df[adj_df.action == 'DIVIDEND']
# dividends = dividends.rename(
# columns={'value': 'amount', 'date': 'ex_date'},
# )
# dividends.drop('action', axis=1, inplace=True)
# # we do not have this data in the yahoo dataset
# dividends['record_date'] = pd.NaT
# dividends['declared_date'] = pd.NaT
# dividends['pay_date'] = pd.NaT
# adjustment_writer.write(splits=splits, dividends=dividends)
return ingest
我遵循了 http://www.zipline.io/ 上的教程,并按照以下步骤使其工作:
为 google 股票准备摄取函数。
您粘贴的相同代码(基于文件 yahoo.py)并进行以下修改:
# Replace line
# adjustment_writer.write(splits=pd.DataFrame(), dividends=pd.DataFrame())
# with line
adjustment_writer.write()
我将文件命名为 google.py
并将其复制到 zipline 安装目录的子目录 zipline/data/bundle
中。 (它可以放在 python 路径的任何地方。或者你可以修改 zipline/data/bundle/__init__.py
以便能够像 yahoo_equities
一样调用它。)
摄取(参见 http://www.zipline.io/bundles.html)
将以下行添加到主目录中的文件 .zipline/extension.py
- 主目录是您在 Windows 上的用户目录(C:\Users\your 用户名)。 .zipline 文件夹是一个隐藏文件夹,您必须取消隐藏文件才能看到它。
from zipline.data.bundles import register
from zipline.data.bundles.google import google_equities
equities2 = {
'JSE:ADR',
}
register(
'my-google-equities-bundle', # name this whatever you like
google_equities(equities2),
)
和运行
zipline ingest -b my-google-equities-bundle
测试(如http://www.zipline.io/beginner-tutorial.html)
我拿了一个示例文件 zipline/examples/buyapple.py
(与您粘贴的相同),将符号 'AAPL'
的两次出现替换为 'JSE:ADR'
,重命名为 buyadcorp.py
和 运行
python -m zipline run -f buyadcorp.py --bundle my-google-equities-bundle --start 2000-1-1 --end 2014-1-1
结果与直接从Google财经下载的数据一致。
请注意:此问题已在下面的 ptrj 中成功回答。我还在我的博客上写了一篇博客 post,介绍我使用 zipline 的经历,您可以在这里找到它:https://financialzipline.wordpress.com
我在南非工作,我正在尝试将南非股票加载到数据框中,以便它可以为 zipline 提供股价信息。假设我正在查看 JSE(约翰内斯堡证券交易所)上市的 AdCorp Holdings Limited:
Google财经给我历史价格信息:
https://www.google.com/finance/historical?q=JSE%3AADR&ei=5G6OV4ibBIi8UcP-nfgB
雅虎财经没有关于该公司的信息。
https://finance.yahoo.com/quote/adcorp?ltr=1
在 iPython Notebook 中输入以下代码,获取来自 Google Finance 的信息的数据框:
start = datetime.datetime(2016,7,1)
end = datetime.datetime(2016,7,18)
f = web.DataReader('JSE:ADR', 'google',start,end)
如果我显示 f,我看到该信息实际上对应于关闭 Google 财务的信息:
这个价格正好低于 Google 金融,你可以在 Google 金融网站上看到 2016-07-18 的信息与我的数据框完全匹配。
但是,我不确定如何加载此数据框,以便 zipline 可以将其用作数据包。
如果您查看为 buyapple.py
给出的示例,您会发现它只是从摄取的数据包 quantopian-quandl
中提取 apple shares (APPL) 的数据。这里的挑战是将 APPL
替换为 JSE:ADR
以便它将每天从数据框而不是数据包 quantopian-quandl
中订购 10 JSE:ADR
股并将其绘制在一张图。
有人知道怎么做吗? 网上几乎没有处理这个的例子...
这是 zipline 示例文件夹中提供的 buyapple.py
代码:
from zipline.api import order, record, symbol
def initialize(context):
pass
def handle_data(context, data):
order(symbol('AAPL'), 10)
record(AAPL=data.current(symbol('AAPL'), 'price'))
# Note: this function can be removed if running
# this algorithm on quantopian.com
def analyze(context=None, results=None):
import matplotlib.pyplot as plt
# Plot the portfolio and asset data.
ax1 = plt.subplot(211)
results.portfolio_value.plot(ax=ax1)
ax1.set_ylabel('Portfolio value (USD)')
ax2 = plt.subplot(212, sharex=ax1)
results.AAPL.plot(ax=ax2)
ax2.set_ylabel('AAPL price (USD)')
# Show the plot.
plt.gcf().set_size_inches(18, 8)
plt.show()
def _test_args():
"""Extra arguments to use when zipline's automated tests run this example.
"""
import pandas as pd
return {
'start': pd.Timestamp('2014-01-01', tz='utc'),
'end': pd.Timestamp('2014-11-01', tz='utc'),
}
编辑:
我查看了从 Yahoo Finance 获取数据的代码,并对其进行了一些修改,使其能够接收 Google Finance 数据。可以在此处找到 Yahoo Finance 的代码:http://www.zipline.io/_modules/zipline/data/bundles/yahoo.html.
这是我提取 Google 财务的代码 - 遗憾的是它不起作用。有人可以更流利地 python 帮助我吗?:
import os
import numpy as np
import pandas as pd
from pandas_datareader.data import DataReader
import requests
from zipline.utils.cli import maybe_show_progress
def _cachpath(symbol, type_):
return '-'.join((symbol.replace(os.path.sep, '_'), type_))
def google_equities(symbols, start=None, end=None):
"""Create a data bundle ingest function from a set of symbols loaded from
yahoo.
Parameters
----------
symbols : iterable[str]
The ticker symbols to load data for.
start : datetime, optional
The start date to query for. By default this pulls the full history
for the calendar.
end : datetime, optional
The end date to query for. By default this pulls the full history
for the calendar.
Returns
-------
ingest : callable
The bundle ingest function for the given set of symbols.
Examples
--------
This code should be added to ~/.zipline/extension.py
.. code-block:: python
from zipline.data.bundles import yahoo_equities, register
symbols = (
'AAPL',
'IBM',
'MSFT',
)
register('my_bundle', yahoo_equities(symbols))
Notes
-----
The sids for each symbol will be the index into the symbols sequence.
"""
# strict this in memory so that we can reiterate over it
symbols = tuple(symbols)
def ingest(environ,
asset_db_writer,
minute_bar_writer, # unused
daily_bar_writer,
adjustment_writer,
calendar,
cache,
show_progress,
output_dir,
# pass these as defaults to make them 'nonlocal' in py2
start=start,
end=end):
if start is None:
start = calendar[0]
if end is None:
end = None
metadata = pd.DataFrame(np.empty(len(symbols), dtype=[
('start_date', 'datetime64[ns]'),
('end_date', 'datetime64[ns]'),
('auto_close_date', 'datetime64[ns]'),
('symbol', 'object'),
]))
def _pricing_iter():
sid = 0
with maybe_show_progress(
symbols,
show_progress,
label='Downloading Google pricing data: ') as it, \
requests.Session() as session:
for symbol in it:
path = _cachpath(symbol, 'ohlcv')
try:
df = cache[path]
except KeyError:
df = cache[path] = DataReader(
symbol,
'google',
start,
end,
session=session,
).sort_index()
# the start date is the date of the first trade and
# the end date is the date of the last trade
start_date = df.index[0]
end_date = df.index[-1]
# The auto_close date is the day after the last trade.
ac_date = end_date + pd.Timedelta(days=1)
metadata.iloc[sid] = start_date, end_date, ac_date, symbol
df.rename(
columns={
'Open': 'open',
'High': 'high',
'Low': 'low',
'Close': 'close',
'Volume': 'volume',
},
inplace=True,
)
yield sid, df
sid += 1
daily_bar_writer.write(_pricing_iter(), show_progress=True)
symbol_map = pd.Series(metadata.symbol.index, metadata.symbol)
asset_db_writer.write(equities=metadata)
adjustment_writer.write(splits=pd.DataFrame(), dividends=pd.DataFrame())
# adjustments = []
# with maybe_show_progress(
# symbols,
# show_progress,
# label='Downloading Google adjustment data: ') as it, \
# requests.Session() as session:
# for symbol in it:
# path = _cachpath(symbol, 'adjustment')
# try:
# df = cache[path]
# except KeyError:
# df = cache[path] = DataReader(
# symbol,
# 'google-actions',
# start,
# end,
# session=session,
# ).sort_index()
# df['sid'] = symbol_map[symbol]
# adjustments.append(df)
# adj_df = pd.concat(adjustments)
# adj_df.index.name = 'date'
# adj_df.reset_index(inplace=True)
# splits = adj_df[adj_df.action == 'SPLIT']
# splits = splits.rename(
# columns={'value': 'ratio', 'date': 'effective_date'},
# )
# splits.drop('action', axis=1, inplace=True)
# dividends = adj_df[adj_df.action == 'DIVIDEND']
# dividends = dividends.rename(
# columns={'value': 'amount', 'date': 'ex_date'},
# )
# dividends.drop('action', axis=1, inplace=True)
# # we do not have this data in the yahoo dataset
# dividends['record_date'] = pd.NaT
# dividends['declared_date'] = pd.NaT
# dividends['pay_date'] = pd.NaT
# adjustment_writer.write(splits=splits, dividends=dividends)
return ingest
我遵循了 http://www.zipline.io/ 上的教程,并按照以下步骤使其工作:
为 google 股票准备摄取函数。
您粘贴的相同代码(基于文件 yahoo.py)并进行以下修改:
# Replace line # adjustment_writer.write(splits=pd.DataFrame(), dividends=pd.DataFrame()) # with line adjustment_writer.write()
我将文件命名为
google.py
并将其复制到 zipline 安装目录的子目录zipline/data/bundle
中。 (它可以放在 python 路径的任何地方。或者你可以修改zipline/data/bundle/__init__.py
以便能够像yahoo_equities
一样调用它。)摄取(参见 http://www.zipline.io/bundles.html)
将以下行添加到主目录中的文件
.zipline/extension.py
- 主目录是您在 Windows 上的用户目录(C:\Users\your 用户名)。 .zipline 文件夹是一个隐藏文件夹,您必须取消隐藏文件才能看到它。from zipline.data.bundles import register from zipline.data.bundles.google import google_equities equities2 = { 'JSE:ADR', } register( 'my-google-equities-bundle', # name this whatever you like google_equities(equities2), )
和运行
zipline ingest -b my-google-equities-bundle
测试(如http://www.zipline.io/beginner-tutorial.html)
我拿了一个示例文件
zipline/examples/buyapple.py
(与您粘贴的相同),将符号'AAPL'
的两次出现替换为'JSE:ADR'
,重命名为buyadcorp.py
和 运行python -m zipline run -f buyadcorp.py --bundle my-google-equities-bundle --start 2000-1-1 --end 2014-1-1
结果与直接从Google财经下载的数据一致。