user3264280 user3264280 - 6 months ago 15
Python Question

find number of gaps in 10second interval data

My data is organized in 10-second intervals for 24 hours:

2015-10-14 15:01:10 3956.58 0 19 6.21 105.99 42 59.24
2015-10-14 15:01:20 3956.58 0 1 0.81 121.57 42 59.24
2015-10-14 15:01:30 3956.58 0 47 8.29 115.53 42 59.24
2015-10-14 15:01:40 3956.58 0 79 12.19 107.64 42 59.24
..
..
..
2015-10-15 13:01:10 3956.58 0 79 8.02 107.64 42 59.24
2015-10-15 13:01:10 3956.58 0 79 7.95 108.98 42 59.24
2015-10-15 13:01:10 3956.58 0 79 7.07 110.58 42 59.24


I want to check if, for any hourly group, there are intervals that exceed 10 seconds. How do I get the gaps for each group and print it? So far I've the following:

df = pd.read_csv('convertcsv.csv', parse_dates = True, index_col=0,
names=['date', 'hole_depth', 'rop', 'rotary',
'torque', 'hook_load', 'azimuth', 'inclin'])
df['num_gaps'] = df.groupby(df.index.date)
df.groupby(df.index.time)['num_gaps'].sum()


I want the output to be:

timestamp, num_of_gaps
2015-10-15 06:00, 5
2015-10-15 07:00, 0
...

Answer

This is a great answer to get you started. Your case is different as you would like to first group by hour and then look for differences larger than 10 seconds (avoids the date difference problem mentioned in the answer).

So you could try, assuming your DataFrame comes with DateTimeIndex:

import pandas as pd
df['tvalue'] = df.index
time_groups = df.groupby(pd.TimeGrouper('H'))
for hour, data in time_groups:
    data['delta'] = (data['tvalue']-data['tvalue'].shift()).fillna(0)
    data['delta_sec'] = data['delta'].apply(lambda x: x  / np.timedelta64(10,'s'))
    print(data[data.delta_sec > 10])

Just saw your edit - you could of course also just count the values per hour and check if the .count() is lower than the 360 one would expect. In other words,

print(df.groupby(TimeGrouper('H')).size())