Data_Kid Data_Kid - 18 days ago 4x
Python Question

Storing the results from a function into a retrievable DataFrame in Python

I am new to python and just been through a couple of books and tutorials on data analysis/ machine learning.

I want to build a classifer and trying to scrape real time stock data.

The following function to pull real time data

from googlefinance import getQuotes
import json
import pandas as pd
import datetime
import requests

def get_intraday_data(symbol, interval_seconds=301, num_days=10):
# Specify URL string based on function inputs.
url_string = ' {0}'.format(symbol.upper())
url_string += "&i={0}&p={1}d&f=d,o,h,l,c,v".format(interval_seconds,num_days)

# Request the text, and split by each line
r = requests.get(url_string).text.split()

# Split each line by a comma, starting at the 8th line
r = [line.split(',') for line in r[7:]]

# Save data in Pandas DataFrame
df = pd.DataFrame(r, columns= ['Datetime','Close','High','Low','Open','Volume'])

# Convert UNIX to Datetime format
df['Datetime'] = df['Datetime'].apply(lambda x: datetime.datetime.fromtimestamp(int(x[1:])))

return df

When I try to call df, I get the following error:

NameError Traceback (most recent call last)
<ipython-input-40-db884686c2f6> in <module>()
18 return df
---> 20 symbol = pd.DataFrame(df)

NameError: name 'df' is not defined

The issue is that I want to be able to store this into a seperate date frame and call it later. The function appears to runs and not store it anywhere. I will appreciate guidance on this.


I'm not familiar enough with computer science terminology to thoroughly explain this to you, but basically, when you call a function that has a return value, that value must be saved as a variable.

df only exists in your function. (I think that's called scope). When you leave the function, df is gone

You're doing

get_intraday_data(symbol, 301,10)

So, after that function is run, the returned variable is gone

instead, do the following:

df = get_intraday_data(symbol, 301,10)

then you can do stuff with it

Alternatively, instead of returning the df, you can pickle it. In your "get_intraday_symbol"

fname = 'file1.P'
return fname

Then, subsequent code has to read the pickled dataframe

fname = get_intraday_data(symbol, 301,10)
df = pd.read_pickle(fname)