Vladimir Sky Vladimir Sky -4 years ago 191
Python Question

Python .csv to pandas dataframe to draw bokeh's candlestick chart

Need help with this, since I'm struggling to understand both how to convert my data stored in .csv to pandas df, and how to parse data for bokeh.

My .csv file looks similar to this

6:22,30,30,31,31
6:38,30,30,32,32
6:53,30,30,31,32
7:05,30,30,32,32
7:39,31,31,33,33


What I do:

df = pd.DataFrame.from_csv('D:\\Job\\GoogleDrive\\Job\\chatwars.csv')


This gives me df with beautiful five columns of data:

30 30.1 31 31.1
6:22
6:38 30 30 32 32
6:53 30 30 31 32
7:05 30 30 32 32
7:39 31 31 33 33


Though I have no idea why first rows are formatted like that, I suggest I just need to add sort of header to the .csv (like 'time, low, open, close, high').

My question is: how do I parse this data correctly, so the bokeh could draw candlestick chart for me?

I've copied this code for candlestick chart from the tutorial, but can hardly read it (I'm only a week into Python, please bear with me). It does exactly everything I need it to, except reading my df:

df["date"] = pd.to_datetime(df["date"])

inc = df.close > df.open
dec = df.open > df.close
w = 12*60*60*1000 # half day in ms

TOOLS = "pan,wheel_zoom,box_zoom,reset,save"

p = figure(x_axis_type="datetime", tools=TOOLS, plot_width=1000, title = "MSFT Candlestick")
p.xaxis.major_label_orientation = pi/4
p.grid.grid_line_alpha=0.3

p.segment(df.date, df.high, df.date, df.low, color="black")
p.vbar(df.date[inc], w, df.open[inc], df.close[inc], fill_color="#D5E1DD", line_color="black")
p.vbar(df.date[dec], w, df.open[dec], df.close[dec], fill_color="#F2583E", line_color="black")

output_file("candlestick.html", title="candlestick.py example")

show(p) # open a browser


After applying suggested code from the commentary, I found it easier to modify my .csv like this:

2017-05-19 06:22:00,30,30,31,31
2017-05-19 06:38:00,30,30,32,32
2017-05-19 06:53:00,30,30,31,32


After that, I've changed 'date' on 'time', and the code worked as supposed too! Awesome! :D

Answer Source

The following should fix your first problem. You can set header to None and specify column names so all lines in the csv will be loaded as data.

df = pd.read_csv('D:\\Job\\GoogleDrive\\Job\\chatwars.csv',header=None,names=['time', 'low', 'open', 'close', 'high'])
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download