ZacAttack ZacAttack - 1 year ago 244
Python Question

download stock price history from yahoo finance with python 3.5

I am currently seeking a way to load multiple years of stock price history from yahoo finance. I will have a 100+ ticker symbols and I will be downloading data from 1985 to current date. I want the Open, High, Low, Close, Adj Close, Volume loaded into individual DataFrames (pandas) with name of the data frame named as the current ticker.

My Problem is that my variable ticker will not work, and even if it did work how can I store this data??

import csv
import pandas as pd
import as web
import datetime

#This represents the start and end date for the data
start = datetime.datetime(1985, 1, 1)
end = datetime.datetime(2016, 1, 27)

ticker = ['aapl','tvix','ugaz']

i= 0

while i < len(ticker):

f = web.DataReader(ticker, 'yahoo', start, end)

I want to be able to permanently store all this data so for example if I load 30 years of price history today for 100 ticker symbols, then tomorrow I will only have to append one day of data not all 30 years. Also the dataframe seems the most efficient way to organize the data but I am no quite for sure, but I want to do machine learning, and data analysis.

Answer Source

You can use the pandas_datareader module to download the data that you want if it is available on yahoo. You can pip install it or install it any other way you prefer.

The following is a sample script that gets the data for a small list of symbols:

from pandas_datareader import data as dreader

symbols = ['GOOG', 'AAPL', 'MMM', 'ACN', 'A', 'ADP']

pnls = {i:dreader.DataReader(i,'yahoo','1985-01-01','2016-09-01') for i in symbols}

This saves the data in a python dictionary which allows you access the data by using the .get method.

For example, if you wanted to get the data for GOOG, you can do:


This gets the following:

                  Open        High         Low       Close    Volume  \
2004-08-19  100.000168  104.060182   95.960165  100.340176  44871300   
2004-08-20  101.010175  109.080187  100.500174  108.310183  22942800   
2004-08-23  110.750191  113.480193  109.050183  109.400185  18342800   
2004-08-24  111.240189  111.600192  103.570177  104.870176  15319700   
2004-08-25  104.960181  108.000187  103.880180  106.000184   9232100   

            Adj Close  
2004-08-19  50.119968  
2004-08-20  54.100990  
2004-08-23  54.645447  
2004-08-24  52.382705  
2004-08-25  52.947145

Please keep in mind that if a given ticker did not have data for a time interval, the returned data will omit those years or days (obviously).


The main problem with your script (among many other things) is that you are making the same call multiple times (len(ticker) times) with the same output being returned over and over again. This is because the DataReader() function, when provided with a list, will go out and get a data.frame for each element in that list. So, when you write:

i = 0
while i < len(ticker):
    f = web.DataReader(ticker, 'yahoo', start, end)

You are essentially saying: with i = 0, I want you to keep making the web.DataReader(ticker, 'yahoo', start, end) call and getting the same data and then printing it until (through incrementing) the value of i reaches the length of my list of tickers. You could have just gotten the same thing with f = web.DataReader(ticker, 'yahoo', start, end). But even with that, you would not know which data.frame is for which ticker.

Also, one other thing that I find quite shocking is that you want to name every data.frame that gets returned, when you're dealing with 100+ tickers. Why on God's Green Earth would you want to have a 100+ names in your namespace? You can very easily centralize all those names into one dictionary (like I recommended above) and access any data.frame from it whenever you please by using .get() and providing the ticker.

Recap: 1) No need for a while-loop. 2) You have no way of knowing which data.frame is for which ticker if you just provide the whole list of tickers to the DataReader() function. 3) You don't want to have to name 100+ data.frames.

Suggestion: Simplify this whole thing by just using a dictionary (the tickers are the keys and the returned data will be the values) and a list comprehension to loop through the tickers and get their data. This can make your script much easier to read and way shorter, I suspect.

Edit 2:

This is an attempt to provide a solution to the desire to run a job once and then increment the data every day with the latest prices available. This is not the only way of doing it, but I think it's a good way. First, you will need to make a script that gets the data for all your tickers from 1985 to yesterday. This script will need to be run after market hours (or late at night) to capture the latest prices. This can be very similar to the script I have above. The only thing you would need to add is a couple of lines of code to save the data on your machine in your current working directory. The following should do:

from pandas_datareader import data as dreader
symbols = ['GOOG', 'AAPL', 'MMM', 'ACN', 'A', 'ADP']

pnls = {i:dreader.DataReader(i,'yahoo','1985-01-01','2016-09-01') for i in symbols}

for df_name in pnls:
    pnls.get(df_name).to_csv("{}_data.csv".format(df_name), index=True, header=True)

Then, you can write another script, which just grabs data for the same tickers for just today. It, too, would need to run at night (before midnight) so that it can capture data from today.

from datetime import datetime
from pandas_datareader import data as dreader
from pandas_datareader._utils import RemoteDataError

symbols = ['GOOG', 'AAPL', 'MMM', 'ACN', 'A', 'ADP']

    pnls = {i:dreader.DataReader(i,'yahoo',, for i in symbols}
except RemoteDataError:
    pnls = None

if pnls is not None:
    for df_name in pnls:
        with open("{}_data.csv".format(df_name),"a") as outfile:
    print("No data available yet. Please run later.")

This second script should append the latest prices to every data file saved previously using the first script.

Please note that I am working with the assumption that Yahoo (or any other data source) will have the current day's prices available as soon as markets close wherever you are.

I hope this helps.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download