adele adele - 6 months ago 52
Python Question

Issues with Pandas Timeseries Resample

I'm using python 3.5.1 and Pandas 0.18.0 and trying to use this notebook to modify financial tick data as the exercises are of interest to me:

I'm having issues with some of the commands and wondered if this is due to the versions of python and pandas?

For example:

This is the file I am reading in with associated output:

data = pd.read_csv('test30dayes2tickforpython.csv',index_col=0, header=0,parse_dates={"Timestamp" : [0,1]})
data.dtypes
Out[80]:
Open float64
High float64
Low float64
Last float64
Volume int64
NumberOfTrades int64
BidVolume int64
AskVolume int64
dtype: object


When I try to then create another object like this:

ticks = data.ix[:, ['High','Volume']]
ticks


I get NaN values:

High Volume
Timestamp
2015-12-27 23:00:25.000 NaN NaN
2015-12-27 23:01:11.000 NaN NaN


But if I use the column reference instead of names it works:

ticks = data.ix[:, [1,4]]
ticks


High Volume
Timestamp
2015-12-27 23:00:25.000 2045.25 1
2015-12-27 23:01:11.000 2045.50 2


Why is this?

Also, the notebook shows another object created:

bars = ticks.Price.resample('1min', how='ohlc')
bars


When I try this I get this error:

bars = ticks.High.resample('60min', how='ohlc')
bars



1 bars = ticks.High.resample('60min', how='ohlc')

AttributeError: 'DataFrame' object has no attribute 'High'


It works if I don't call the High column:

bars = ticks.resample('60min', how='ohlc')
bars



FutureWarning: how in .resample() is deprecated the new syntax is .resample(...).ohlc()


High Volume
open high low close open high low close
Timestamp
2015-12-27 23:00:00 2045.25 2047.75 2045.25 2045.25 1.0 7.0 1.0 5.0


What is the correct command for this please?

I appreciate the notebook is probably not valid for the version of Python/Pandas Im using but as a newbie it is very useful for me so would like to get it working on my data.

Answer

There is problem spaces in column names.

print (data.columns)
Index(['Timestamp', ' Open', ' High', ' Low', ' Last', ' Volume',
       ' NumberOfTrades', ' BidVolume', ' AskVolume'],
      dtype='object')

You can strip this spaces:

data.columns = data.columns.str.strip()
print (data.columns)
Index(['Timestamp', 'Open', 'High', 'Low', 'Last', 'Volume', 'NumberOfTrades',
       'BidVolume', 'AskVolume'],
      dtype='object')

ticks = data.ix[:, ['High','Volume']]
print (ticks.head())
      High  Volume
0  2045.25       1
1  2045.50       2
2  2045.50       2
3  2045.50       2
4  2045.50       2

Now you can use:

print (ticks.Price.resample('1min', how='ohlc'))

If you dont want remove spaces, add space to column name:

print (ticks[' Price'].resample('1min', how='ohlc'))

But better is use Resampler.ohlc, if pandas version higher as 0.18.0:

print (ticks.Price.resample('1min').ohlc())