Simon Simon - 1 year ago 446
Python Question

Zipline: using pandas-datareader to feed in Google Finance dataframe for non-US based financial markets

I'm based in South Africa and I'm trying to load South African shares into a dataframe so that it will feed zipline with share price information. Let's say I'm looking at AdCorp Holdings Limited as listed on the JSE (Johannesburg Stock Exchange):

Google Finance gives me the historical price info:

Yahoo Finance has no information on the company.

Typing in the following code within iPython Notebook gets me the dataframe for the information from Google Finance:

start = datetime.datetime(2016,7,1)
end = datetime.datetime(2016,7,18)
f = web.DataReader('JSE:ADR', 'google',start,end)

If I display f, I see that the information actually corresponds to the info off Google Finance as well:

enter image description here

This is the price exactly off Google Finance, you can see the info for the 2016-07-18 on the Google Finance website matches exactly to my dataframe.

enter image description here

However, I'm not sure how to load this dataframe so that it can be used by zipline as a data bundle.

If you look at the example given for
, you can see that it just pulls the data of apple shares (APPL) from the ingested data bundle
. The challenge here is to replace
so that it will order 10
shares a day as fed from the dataframe instead of the data bundle
and plot it on a graph.

Does anyone know how to do this?
There are almost no examples on the net that deals with this...

This is the
code as supplied in zipline's example folder:

from zipline.api import order, record, symbol

def initialize(context):

def handle_data(context, data):
order(symbol('AAPL'), 10)
record(AAPL=data.current(symbol('AAPL'), 'price'))

# Note: this function can be removed if running
# this algorithm on
def analyze(context=None, results=None):
import matplotlib.pyplot as plt
# Plot the portfolio and asset data.
ax1 = plt.subplot(211)
ax1.set_ylabel('Portfolio value (USD)')
ax2 = plt.subplot(212, sharex=ax1)
ax2.set_ylabel('AAPL price (USD)')

# Show the plot.
plt.gcf().set_size_inches(18, 8)

def _test_args():
"""Extra arguments to use when zipline's automated tests run this example.
import pandas as pd

return {
'start': pd.Timestamp('2014-01-01', tz='utc'),
'end': pd.Timestamp('2014-11-01', tz='utc'),


I looked at the code for ingesting the data from Yahoo Finance and modified it a little to make it take on Google Finance data. The code for the Yahoo Finance can be found here:

This is my code to ingest Google Finance - sadly it is not working. Can someone more fluent in python assist me?:

import os

import numpy as np
import pandas as pd
from import DataReader
import requests

from zipline.utils.cli import maybe_show_progress

def _cachpath(symbol, type_):
return '-'.join((symbol.replace(os.path.sep, '_'), type_))

def google_equities(symbols, start=None, end=None):
"""Create a data bundle ingest function from a set of symbols loaded from

symbols : iterable[str]
The ticker symbols to load data for.
start : datetime, optional
The start date to query for. By default this pulls the full history
for the calendar.
end : datetime, optional
The end date to query for. By default this pulls the full history
for the calendar.

ingest : callable
The bundle ingest function for the given set of symbols.

This code should be added to ~/.zipline/

.. code-block:: python

from import yahoo_equities, register

symbols = (
register('my_bundle', yahoo_equities(symbols))

The sids for each symbol will be the index into the symbols sequence.
# strict this in memory so that we can reiterate over it
symbols = tuple(symbols)

def ingest(environ,
minute_bar_writer, # unused
# pass these as defaults to make them 'nonlocal' in py2
if start is None:
start = calendar[0]
if end is None:
end = None

metadata = pd.DataFrame(np.empty(len(symbols), dtype=[
('start_date', 'datetime64[ns]'),
('end_date', 'datetime64[ns]'),
('auto_close_date', 'datetime64[ns]'),
('symbol', 'object'),

def _pricing_iter():
sid = 0
with maybe_show_progress(
label='Downloading Google pricing data: ') as it, \
requests.Session() as session:
for symbol in it:
path = _cachpath(symbol, 'ohlcv')
df = cache[path]
except KeyError:
df = cache[path] = DataReader(

# the start date is the date of the first trade and
# the end date is the date of the last trade
start_date = df.index[0]
end_date = df.index[-1]
# The auto_close date is the day after the last trade.
ac_date = end_date + pd.Timedelta(days=1)
metadata.iloc[sid] = start_date, end_date, ac_date, symbol

'Open': 'open',
'High': 'high',
'Low': 'low',
'Close': 'close',
'Volume': 'volume',
yield sid, df
sid += 1

daily_bar_writer.write(_pricing_iter(), show_progress=True)

symbol_map = pd.Series(metadata.symbol.index, metadata.symbol)

adjustment_writer.write(splits=pd.DataFrame(), dividends=pd.DataFrame())
# adjustments = []
# with maybe_show_progress(
# symbols,
# show_progress,
# label='Downloading Google adjustment data: ') as it, \
# requests.Session() as session:
# for symbol in it:
# path = _cachpath(symbol, 'adjustment')
# try:
# df = cache[path]
# except KeyError:
# df = cache[path] = DataReader(
# symbol,
# 'google-actions',
# start,
# end,
# session=session,
# ).sort_index()

# df['sid'] = symbol_map[symbol]
# adjustments.append(df)

# adj_df = pd.concat(adjustments)
# = 'date'
# adj_df.reset_index(inplace=True)

# splits = adj_df[adj_df.action == 'SPLIT']
# splits = splits.rename(
# columns={'value': 'ratio', 'date': 'effective_date'},
# )
# splits.drop('action', axis=1, inplace=True)

# dividends = adj_df[adj_df.action == 'DIVIDEND']
# dividends = dividends.rename(
# columns={'value': 'amount', 'date': 'ex_date'},
# )
# dividends.drop('action', axis=1, inplace=True)
# # we do not have this data in the yahoo dataset
# dividends['record_date'] = pd.NaT
# dividends['declared_date'] = pd.NaT
# dividends['pay_date'] = pd.NaT

# adjustment_writer.write(splits=splits, dividends=dividends)

return ingest

Answer Source

I followed tutorials on and I made it work with the following steps:

  1. Prepare an ingestion function for google equities.

    The same code you pasted (based on file with the following modification:

    # Replace line
    # adjustment_writer.write(splits=pd.DataFrame(), dividends=pd.DataFrame())
    # with line

    I named the file and copied it to subdirectory zipline/data/bundle of the zipline install directory. (It can be placed anywhere on the python path. Or you can modify zipline/data/bundle/ to be able to call it the same way as yahoo_equities.)

  2. Ingest (see

    Add the following lines to file .zipline/ in the home directory - the home directory is your User directory on Windows (C:\Users\your username). The .zipline folder is a hidden folder, you will have to unhide the files to see it.

    from import register
    from import google_equities
    equities2 = {
        'my-google-equities-bundle',  # name this whatever you like

    And run

    zipline ingest -b my-google-equities-bundle
  3. Test (as in

    I took an example file zipline/examples/ (the same you pasted), replaced both occurences of symbol 'AAPL' with 'JSE:ADR', renamed to and ran

    python -m zipline run -f --bundle my-google-equities-bundle --start 2000-1-1 --end 2014-1-1

    The outcome was consistent with the data downloaded directly from Google Finance.