Stanislav Jirák Stanislav Jirák - 1 year ago 75
Python Question

Overflow error in Python with pandas

I'm following this tutorial: https://www.youtube.com/watch?v=wfTABU8VeoY&list=PLQVvvaa0QuDfHt4XU7vTm22xDegR0v0fQ&index=7 for data analysis with pandas but when I want to run following code

import datetime
import pandas as pd
from pandas import DataFrame
import pandas.io.data
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D

sp500 = pd.io.data.get_data_yahoo('%5EGSPC', start = datetime.datetime(2015, 10, 15),
end = datetime.datetime(2016, 10, 15))
sp500.to_csv('sp500.csv')

df = pd.read_csv('sp500.csv', index_col = 'Date', parse_dates=True)

df['H-L'] = df['High'] - df.Low
df['100MA'] = pd.rolling_mean(df['Close'], 100)
df['Difference'] = df['Close'].diff()

threedee = plt.figure().gca(projection='3d')
threedee.scatter(df.index, df['H-L'], df['Close'])
threedee.set_xlabel('Index')
threedee.set_zlabel('Close')
threedee.set_ylabel('H-L')

plt.show()


It produces both in Jupyter notebook and PyCharm an error as follows:

OverflowError Traceback (most recent call last)
C:\Program Files\Anaconda2\lib\site-packages\IPython\core\formatters.py in __call__(self, obj)
305 pass
306 else:
--> 307 return printer(obj)
308 # Finally look for special method names
309 method = get_real_method(obj, self.print_method)

C:\Program Files\Anaconda2\lib\site-packages\IPython\core\pylabtools.py in <lambda>(fig)
225
226 if 'png' in formats:
--> 227 png_formatter.for_type(Figure, lambda fig: print_figure(fig, 'png', **kwargs))
228 if 'retina' in formats or 'png2x' in formats:
229 png_formatter.for_type(Figure, lambda fig: retina_figure(fig, **kwargs))

C:\Program Files\Anaconda2\lib\site-packages\IPython\core\pylabtools.py in print_figure(fig, fmt, bbox_inches, **kwargs)
117
118 bytes_io = BytesIO()
--> 119 fig.canvas.print_figure(bytes_io, **kw)
120 data = bytes_io.getvalue()
121 if fmt == 'svg':


with many others various paths including matplotlib.py and as on.
What's wrong? It isn't too much data to load, is it?

Answer Source

Have you tried replacing this line

threedee.scatter(df.index, df['H-L'], df['Close'])

with the following?

threedee.scatter(range(len(df.index)), df['H-L'], df['Close'])

You are plotting timestamps as values. It is possible that matplotlib doesn't understand what numerical values the timestamps carry.

Edit: unfortunately, this workaround this workaround turns the xaxis ticks into a numberic range. But we can always set the ticks manually:

threedee.scatter(df.index, df['H-L'], df['Close'])

renderer = fig.canvas.get_renderer()
threedee.draw(renderer)
old_xticks = [t.get_text() for t in threedee.xaxis.get_ticklabels()]
new_xticks = [df.index[int(t)].strftime("%Y-%m-%d")
               if t is not '' else '' for t in old_xticks]
threedee.xaxis.set_ticklabels(new_xticks)

threedee.set_xlabel('Index')
threedee.set_zlabel('Close')
threedee.set_ylabel('H-L')

plt.show()