adele adele - 1 year ago 196
Python Question

Issues with Pandas Timeseries Resample

I'm using python 3.5.1 and Pandas 0.18.0 and trying to use this notebook to modify financial tick data as the exercises are of interest to me:

I'm having issues with some of the commands and wondered if this is due to the versions of python and pandas?

For example:

This is the file I am reading in with associated output:

data = pd.read_csv('test30dayes2tickforpython.csv',index_col=0, header=0,parse_dates={"Timestamp" : [0,1]})
Open float64
High float64
Low float64
Last float64
Volume int64
NumberOfTrades int64
BidVolume int64
AskVolume int64
dtype: object

When I try to then create another object like this:

ticks = data.ix[:, ['High','Volume']]

I get NaN values:

High Volume
2015-12-27 23:00:25.000 NaN NaN
2015-12-27 23:01:11.000 NaN NaN

But if I use the column reference instead of names it works:

ticks = data.ix[:, [1,4]]

High Volume
2015-12-27 23:00:25.000 2045.25 1
2015-12-27 23:01:11.000 2045.50 2

Why is this?

Also, the notebook shows another object created:

bars = ticks.Price.resample('1min', how='ohlc')

When I try this I get this error:

bars = ticks.High.resample('60min', how='ohlc')

1 bars = ticks.High.resample('60min', how='ohlc')

AttributeError: 'DataFrame' object has no attribute 'High'

It works if I don't call the High column:

bars = ticks.resample('60min', how='ohlc')

FutureWarning: how in .resample() is deprecated the new syntax is .resample(...).ohlc()

High Volume
open high low close open high low close
2015-12-27 23:00:00 2045.25 2047.75 2045.25 2045.25 1.0 7.0 1.0 5.0

What is the correct command for this please?

I appreciate the notebook is probably not valid for the version of Python/Pandas Im using but as a newbie it is very useful for me so would like to get it working on my data.

Answer Source

There is problem spaces in column names.

print (data.columns)
Index(['Timestamp', ' Open', ' High', ' Low', ' Last', ' Volume',
       ' NumberOfTrades', ' BidVolume', ' AskVolume'],

You can strip this spaces:

data.columns = data.columns.str.strip()
print (data.columns)
Index(['Timestamp', 'Open', 'High', 'Low', 'Last', 'Volume', 'NumberOfTrades',
       'BidVolume', 'AskVolume'],

ticks = data.ix[:, ['High','Volume']]
print (ticks.head())
      High  Volume
0  2045.25       1
1  2045.50       2
2  2045.50       2
3  2045.50       2
4  2045.50       2

Now you can use:

print (ticks.Price.resample('1min', how='ohlc'))

If you dont want remove spaces, add space to column name:

print (ticks[' Price'].resample('1min', how='ohlc'))

But better is use Resampler.ohlc, if pandas version higher as 0.18.0:

print (ticks.Price.resample('1min').ohlc())