supernoob supernoob - 1 month ago 6
Python Question

Iterating Through Timeseries One Day at a Time

Massive edit:

Ok, so I have a timeseries dataframe at the minute level. For sake of example, this dataframe is one year's worth of data. I am attempting to create an analytical model that will iterate through this data day-to-day.

The function will:
1) Slice a day's worth of data from the dataframe.
2) Create a 30 minute (of first 30 minutes of day) sub-slice of the daily slice.
3) Pass the data from both slices through the analytical part of the function.
4) Append to new dataframe.
5) Continue iterations until complete.

The dataframe is in the following format:

open_price high low close_price volume price
2015-01-06 14:31:00+00:00 46.3800 46.440 46.29 46.380 560221.0 46.380
2015-01-06 14:32:00+00:00 46.3800 46.400 46.30 46.390 52959.0 46.390
2015-01-06 14:33:00+00:00 46.3900 46.495 46.36 46.470 100100.0 46.470
2015-01-06 14:34:00+00:00 46.4751 46.580 46.41 46.575 85615.0 46.575
2015-01-06 14:35:00+00:00 46.5800 46.610 46.53 46.537 78175.0 46.537


It seems to me that pandas datetimeindex functionalities are the best way to go about this task, but I have no idea where to start.

(1) Seems like I could use the .rollforward functionality, starting with the df start date/time, and roll forward one day through each iteration.

(2) Use a df.loc[mask] to create the subslice.

I'm fairly certain I can figure it out after (2), but once again I'm not very familiar with timeseries analysis or pandas datetimeindex functionalities.

Final dataframe:

high low retrace time
2015-01-06 46.440 46.29 True 47
2015-01-07 46.400 46.30 True 138
2015-01-08 46.495 46.36 False NaN
2015-01-09 46.580 46.41 True 95
2015-01-10 46.610 46.53 False NaN


High = High of first 30 minutes of day

Low = Low of first 30 minutes of day

Retrace = Boolean, if price returned to the open price at some point during the day after the first 30 minutes.

Time = The amount of time (minutes) it took to retrace.

Here's my code that seems to work (thanks all for your help!):

sample = msft_prices.ix[s_date:e_date]
sample = sample.resample('D').mean()
sample = sample.dropna()
sample = sample.index.strftime('%Y-%m-%d')
ORTDF = pd.DataFrame()
ORDF = pd.DataFrame()
list1 = []
list2 = []
def hi_lo(prices):

for i in sample:
list1 = []
if i in prices.index:

ORTDF = prices[i+' 14:30':i+' 15:00']
ORH = max(ORTDF['high']) #integer value
ORHK = ORTDF['high'].idxmax()
ORL = min(ORTDF['low']) #integer value
ORLK = ORTDF['low'].idxmin()
list1.append(ORH)
list1.append(ORL)



if ORHK < ORLK:
dailydf = prices[i+' 14:30':i+' 21:00']
if max(dailydf['high']) > ORH:
ORDH = max(dailydf['high'])
ORDHK = dailydf['high'].idxmax()
touched = 1
time_to_touch = ORDHK - ORHK
time_to_touch = time_to_touch.total_seconds() / 60
list1.append(touched)
list1.append(time_to_touch)
list2.append(list1)
else:
touched = 0
list1.append(touched)
list1.append('NaN')
list2.append(list1)
elif ORHK > ORLK:
dailydf = prices[i+' 14:30':i+' 21:00']
if min(dailydf['low']) < ORL:
ORDL = min(dailydf['low'])
ORDLK = dailydf['low'].idxmin()
touched = 1
time_to_touch = ORDLK - ORLK
time_to_touch = time_to_touch.total_seconds() / 60
list1.append(touched)
list1.append(time_to_touch)
list2.append(list1)
else:
touched = 0
list1.append(touched)
list1.append('NaN')
list2.append(list1)


else:
pass


ORDF = pd.DataFrame(list2, columns=['High', 'Low', 'Retraced', 'Time']).set_index([sample])
return ORDF


This probably isn't the most elegant way to go about it, but hey, it works!

Answer

Read the docs for general reference

Setup (next time please provide this yourself in the question!):

dates = pd.to_datetime(['19 November 2010 9:01', '19 November 2010 9:02', '19 November 2010 9:03',
                       '20 November 2010 9:05', '20 November 2010 9:06', '20 November 2010 9:07'])
df = pd.DataFrame({'low_price': [1.2, 1.8, 1.21, 2., 4., 1.201],  
                  'high_price': [3., 1.8, 1.21, 4., 4.01, 1.201]}, index=dates)
df

                    high_price  low_price
2010-11-19 09:01:00     3.000   1.200
2010-11-19 09:02:00     1.800   1.800
2010-11-19 09:03:00     1.210   1.210
2010-11-20 09:05:00     4.000   2.000
2010-11-20 09:06:00     4.010   4.000
2010-11-20 09:07:00     1.201   1.201

I'll group by Day and then for each day apply a function that computes whether there was a retrace and the time period when it happened. Your question wasn't clear on which column to operate or what is the tolerance level to say "prices are the same", so I put them as options

def retrace_per_day(day, col='high_price', epsilon=0.5):
    """take day data and returns whether there was a retrace.
    If yes, return 1 and the minute in which it did.
    Otherwise return 0 and np.nan"""
    cond = (np.abs(day[col] - day[col][0]) < epsilon)
    cond_index = cond[cond].index
    if len(cond_index) > 1:
        retrace, period = 1, cond_index[1]
    else:
        retrace, period = 0, np.nan
    return pd.Series({'retrace': retrace, 'period' : period})

df.groupby(pd.TimeGrouper('1D')).apply(retrace_per_day)

           period   retrace
2010-11-19  NaN     0.0
2010-11-20  2010-11-20 09:06:00     1.0

You can then use this to merge back into your original dataframe if needed.