Zipline：使用 pandas-datareader 为非美国金融市场输入 Google 金融数据框

Question

请注意：此问题已在下面的 ptrj 中成功回答。我还在我的博客上写了一篇博客 post，介绍我使用 zipline 的经历，您可以在这里找到它：https://financialzipline.wordpress.com

我在南非工作，我正在尝试将南非股票加载到数据框中，以便它可以为 zipline 提供股价信息。假设我正在查看 JSE（约翰内斯堡证券交易所）上市的 AdCorp Holdings Limited：

Google财经给我历史价格信息：

https://www.google.com/finance/historical?q=JSE%3AADR&ei=5G6OV4ibBIi8UcP-nfgB

雅虎财经没有关于该公司的信息。

https://finance.yahoo.com/quote/adcorp?ltr=1

在 iPython Notebook 中输入以下代码，获取来自 Google Finance 的信息的数据框：

start = datetime.datetime(2016,7,1)
end = datetime.datetime(2016,7,18)    
f = web.DataReader('JSE:ADR', 'google',start,end)

如果我显示 f，我看到该信息实际上对应于关闭 Google 财务的信息：

这个价格正好低于 Google 金融，你可以在 Google 金融网站上看到 2016-07-18 的信息与我的数据框完全匹配。

但是，我不确定如何加载此数据框，以便 zipline 可以将其用作数据包。

如果您查看为 buyapple.py 给出的示例，您会发现它只是从摄取的数据包 quantopian-quandl 中提取 apple shares (APPL) 的数据。这里的挑战是将 APPL 替换为 JSE:ADR 以便它将每天从数据框而不是数据包 quantopian-quandl 中订购 10 JSE:ADR 股并将其绘制在一张图。

有人知道怎么做吗？网上几乎没有处理这个的例子...

这是 zipline 示例文件夹中提供的 buyapple.py 代码：

from zipline.api import order, record, symbol


def initialize(context):
    pass


def handle_data(context, data):
    order(symbol('AAPL'), 10)
    record(AAPL=data.current(symbol('AAPL'), 'price'))


# Note: this function can be removed if running
# this algorithm on quantopian.com
def analyze(context=None, results=None):
    import matplotlib.pyplot as plt
    # Plot the portfolio and asset data.
    ax1 = plt.subplot(211)
    results.portfolio_value.plot(ax=ax1)
    ax1.set_ylabel('Portfolio value (USD)')
    ax2 = plt.subplot(212, sharex=ax1)
    results.AAPL.plot(ax=ax2)
    ax2.set_ylabel('AAPL price (USD)')

    # Show the plot.
    plt.gcf().set_size_inches(18, 8)
    plt.show()


def _test_args():
    """Extra arguments to use when zipline's automated tests run this example.
    """
    import pandas as pd

    return {
        'start': pd.Timestamp('2014-01-01', tz='utc'),
        'end': pd.Timestamp('2014-11-01', tz='utc'),
    }

编辑：

我查看了从 Yahoo Finance 获取数据的代码，并对其进行了一些修改，使其能够接收 Google Finance 数据。可以在此处找到 Yahoo Finance 的代码：http://www.zipline.io/_modules/zipline/data/bundles/yahoo.html.

这是我提取 Google 财务的代码 - 遗憾的是它不起作用。有人可以更流利地 python 帮助我吗？:

import os

import numpy as np
import pandas as pd
from pandas_datareader.data import DataReader
import requests

from zipline.utils.cli import maybe_show_progress


def _cachpath(symbol, type_):
    return '-'.join((symbol.replace(os.path.sep, '_'), type_))


def google_equities(symbols, start=None, end=None):
    """Create a data bundle ingest function from a set of symbols loaded from
    yahoo.

    Parameters
    ----------
    symbols : iterable[str]
        The ticker symbols to load data for.
    start : datetime, optional
        The start date to query for. By default this pulls the full history
        for the calendar.
    end : datetime, optional
        The end date to query for. By default this pulls the full history
        for the calendar.

    Returns
    -------
    ingest : callable
        The bundle ingest function for the given set of symbols.

    Examples
    --------
    This code should be added to ~/.zipline/extension.py

    .. code-block:: python

       from zipline.data.bundles import yahoo_equities, register

       symbols = (
           'AAPL',
           'IBM',
           'MSFT',
       )
       register('my_bundle', yahoo_equities(symbols))

    Notes
    -----
    The sids for each symbol will be the index into the symbols sequence.
    """
    # strict this in memory so that we can reiterate over it
    symbols = tuple(symbols)

    def ingest(environ,
               asset_db_writer,
               minute_bar_writer,  # unused
               daily_bar_writer,
               adjustment_writer,
               calendar,
               cache,
               show_progress,
               output_dir,
               # pass these as defaults to make them 'nonlocal' in py2
               start=start,
               end=end):
        if start is None:
            start = calendar[0]
        if end is None:
            end = None

        metadata = pd.DataFrame(np.empty(len(symbols), dtype=[
            ('start_date', 'datetime64[ns]'),
            ('end_date', 'datetime64[ns]'),
            ('auto_close_date', 'datetime64[ns]'),
            ('symbol', 'object'),
        ]))

        def _pricing_iter():
            sid = 0
            with maybe_show_progress(
                    symbols,
                    show_progress,
                    label='Downloading Google pricing data: ') as it, \
                    requests.Session() as session:
                for symbol in it:
                    path = _cachpath(symbol, 'ohlcv')
                    try:
                        df = cache[path]
                    except KeyError:
                        df = cache[path] = DataReader(
                            symbol,
                            'google',
                            start,
                            end,
                            session=session,
                        ).sort_index()

                    # the start date is the date of the first trade and
                    # the end date is the date of the last trade
                    start_date = df.index[0]
                    end_date = df.index[-1]
                    # The auto_close date is the day after the last trade.
                    ac_date = end_date + pd.Timedelta(days=1)
                    metadata.iloc[sid] = start_date, end_date, ac_date, symbol

                    df.rename(
                        columns={
                            'Open': 'open',
                            'High': 'high',
                            'Low': 'low',
                            'Close': 'close',
                            'Volume': 'volume',
                        },
                        inplace=True,
                    )
                    yield sid, df
                    sid += 1

        daily_bar_writer.write(_pricing_iter(), show_progress=True)

        symbol_map = pd.Series(metadata.symbol.index, metadata.symbol)
        asset_db_writer.write(equities=metadata)

        adjustment_writer.write(splits=pd.DataFrame(), dividends=pd.DataFrame())
        # adjustments = []
        # with maybe_show_progress(
        #         symbols,
        #         show_progress,
        #         label='Downloading Google adjustment data: ') as it, \
        #         requests.Session() as session:
        #     for symbol in it:
        #         path = _cachpath(symbol, 'adjustment')
        #         try:
        #             df = cache[path]
        #         except KeyError:
        #             df = cache[path] = DataReader(
        #                 symbol,
        #                 'google-actions',
        #                 start,
        #                 end,
        #                 session=session,
        #             ).sort_index()

        #         df['sid'] = symbol_map[symbol]
        #         adjustments.append(df)

        # adj_df = pd.concat(adjustments)
        # adj_df.index.name = 'date'
        # adj_df.reset_index(inplace=True)

        # splits = adj_df[adj_df.action == 'SPLIT']
        # splits = splits.rename(
        #     columns={'value': 'ratio', 'date': 'effective_date'},
        # )
        # splits.drop('action', axis=1, inplace=True)

        # dividends = adj_df[adj_df.action == 'DIVIDEND']
        # dividends = dividends.rename(
        #     columns={'value': 'amount', 'date': 'ex_date'},
        # )
        # dividends.drop('action', axis=1, inplace=True)
        # # we do not have this data in the yahoo dataset
        # dividends['record_date'] = pd.NaT
        # dividends['declared_date'] = pd.NaT
        # dividends['pay_date'] = pd.NaT

        # adjustment_writer.write(splits=splits, dividends=dividends)

    return ingest

Answer 1

我遵循了 http://www.zipline.io/ 上的教程，并按照以下步骤使其工作：

为 google 股票准备摄取函数。

您粘贴的相同代码（基于文件 yahoo.py）并进行以下修改：
```
# Replace line
# adjustment_writer.write(splits=pd.DataFrame(), dividends=pd.DataFrame())
# with line
adjustment_writer.write()
```
我将文件命名为 google.py 并将其复制到 zipline 安装目录的子目录 zipline/data/bundle 中。（它可以放在 python 路径的任何地方。或者你可以修改 zipline/data/bundle/__init__.py 以便能够像 yahoo_equities 一样调用它。）
摄取（参见 http://www.zipline.io/bundles.html）

将以下行添加到主目录中的文件 .zipline/extension.py - 主目录是您在 Windows 上的用户目录（C:\Users\your 用户名）。 .zipline 文件夹是一个隐藏文件夹，您必须取消隐藏文件才能看到它。
```
from zipline.data.bundles import register

from zipline.data.bundles.google import google_equities

equities2 = {
    'JSE:ADR',
}

register(
    'my-google-equities-bundle',  # name this whatever you like
    google_equities(equities2),
)
```
和运行
```
zipline ingest -b my-google-equities-bundle
```
测试（如http://www.zipline.io/beginner-tutorial.html）

我拿了一个示例文件 zipline/examples/buyapple.py（与您粘贴的相同），将符号 'AAPL' 的两次出现替换为 'JSE:ADR'，重命名为 buyadcorp.py 和运行
```
python -m zipline run -f buyadcorp.py --bundle my-google-equities-bundle --start 2000-1-1 --end 2014-1-1
```
结果与直接从Google财经下载的数据一致。

Zipline：使用 pandas-datareader 为非美国金融市场输入 Google 金融数据框

Zipline: using pandas-datareader to feed in Google Finance dataframe for non-US based financial markets

python

pandas

zipline

pandas-datareader