Fábio Araújo Fábio Araújo - 1 month ago 8
Python Question

How to resample OHLCV data into 5-min?

I have this set of data

2016-08-09 12:39:00,536.7841,536.7849,536.6141,536.7849,0.656
2016-08-09 12:40:00,536.6749,536.6749,536.6749,536.6749,0.2642
2016-08-09 12:41:00,535.84,535.84,535.615,535.615,0.348
2016-08-09 12:42:00,535.5401,535.5401,534.1801,534.1801,0.507
2016-08-09 12:43:00,534.5891,534.8753,534.5891,534.807,0.656
2016-08-09 12:44:00,534.8014,534.878,534.8014,534.8416,0.502
2016-08-09 12:45:00,534.8131,534.8131,534.2303,534.6736,0.552
2016-08-09 12:47:00,534.756,538.5999,534.756,534.7836,0.62647241
2016-08-09 12:48:00,536.0557,536.6864,536.0557,536.6864,1.2614
2016-08-09 12:49:00,536.8966,537.7289,536.8966,537.7289,0.532
2016-08-09 12:50:00,537.9829,539.2199,537.9829,539.2199,0.67752932
2016-08-09 12:51:00,538.5,539.2199,538.5,539.2199,0.43768953


I want to resample it to 5-minutes OHCLV, so I did this code:

import pandas as pd

df= pd.read_csv("C:\Users\Araujo's PC\Desktop\python_scripts\CSV\cex_btc.csv",
names=['timestamps','open','high','low','close','volume'])

df.set_index('timestamps',inplace=True)

ohlc_dict = {
'open':'first',
'high':'max',
'low':'min',
'close':'last',
'volume':'sum'
}

df.resample('5T', how=ohlc_dict)

print df


It appears me this error:


TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'Index'


Can someone help me on this?

Answer Source

You just need to convert the values in your timestamps column to pandas timestamps before setting the index with their values. They are currently just text fields I believe.

df['timestamps'] = pd.to_datetime(df['timestamps'])
df.set_index('timestamps', inplace=True)

>>> df.resample('5T', how=ohlc_dict)

                         high     close      open       low    volume
timestamps                                                           
2016-08-09 12:35:00  536.7849  536.7849  536.7841  536.6141  0.656000
2016-08-09 12:40:00  536.6749  534.8416  536.6749  534.1801  2.277200
2016-08-09 12:45:00  538.5999  537.7289  534.8131  534.2303  2.971872
2016-08-09 12:50:00  539.2199  539.2199  537.9829  537.9829  1.115219

You can also try to parse these when reading the csv:

pd.read_csv(filename, parse_dates=['timestamps'],
            names=['timestamps','open','high','low','close','volume'])