Joshua Kidd Joshua Kidd - 1 year ago 77
Python Question

How to find harmonic average speeds from pandas dataframe

I have a pandas dataframe with a column of speeds in KmH, and a column of timestamps:

Date, Speed
2016-07-07 13:38:02.000, 50.718590
2016-07-18 11:28:00.000, 2.357645
2016-07-15 15:03:08.000, 14.652172
2016-07-18 06:53:00.000, 24.530390
... ...
2016-07-18 18:41:31.000, 31.761416
2016-07-14 05:28:42.187, 7.532758


What I want is to have a harmonic average speed for each 15 minutes in an average day:

Time, Speed
00:00, 32
00:15, 10
00:30, 12
00:45, 41
01:00, 12
...
23:30, 30
23:45, 31


My initial attempt was to remove the dates from each timestamp, set it as the index, and use TimeGrouper to find the average. (My dataframe is called output) The code was:

output['Speed'] = output['Speed']**-1
output['Date'] = output['Date'].apply( lambda d : d.time() )
output = output.set_index(['Date'])
output = output.groupby(pd.TimeGrouper('15Min')).mean()
output['Speed'] = output['Speed']**-1


The code didn't work however as it gave me an error:

Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'Int64Index'

Answer Source

I think what you want to do is normalise the dates, then you do the resampling:

In [177]:
df['Date'] = pd.to_datetime(df['Date'].dt.strftime('%H:%M:%S'))
df

Out[177]:
                 Date      Speed
0 2017-02-07 13:38:02  50.718590
1 2017-02-07 11:28:00   2.357645
2 2017-02-07 15:03:08  14.652172
3 2017-02-07 06:53:00  24.530390
4 2017-02-07 18:41:31  31.761416
5 2017-02-07 05:28:42   7.532758

now all dates are same, by default today's date, then do what you want:

In [178]:
output = df.set_index('Date')
output = output.groupby(pd.TimeGrouper('15Min')).mean()
output['Speed'] = output['Speed']**-1
output

Out[178]:
                        Speed
Date                         
2017-02-07 05:15:00  0.132754
2017-02-07 05:30:00       NaN
2017-02-07 05:45:00       NaN
2017-02-07 06:00:00       NaN
2017-02-07 06:15:00       NaN
2017-02-07 06:30:00       NaN
2017-02-07 06:45:00  0.040766
2017-02-07 07:00:00       NaN
2017-02-07 07:15:00       NaN
2017-02-07 07:30:00       NaN
2017-02-07 07:45:00       NaN
2017-02-07 08:00:00       NaN
2017-02-07 08:15:00       NaN
2017-02-07 08:30:00       NaN
2017-02-07 08:45:00       NaN
2017-02-07 09:00:00       NaN
2017-02-07 09:15:00       NaN
2017-02-07 09:30:00       NaN
2017-02-07 09:45:00       NaN
2017-02-07 10:00:00       NaN
2017-02-07 10:15:00       NaN
2017-02-07 10:30:00       NaN
2017-02-07 10:45:00       NaN
2017-02-07 11:00:00       NaN
2017-02-07 11:15:00  0.424152
2017-02-07 11:30:00       NaN
2017-02-07 11:45:00       NaN
2017-02-07 12:00:00       NaN
2017-02-07 12:15:00       NaN
2017-02-07 12:30:00       NaN
2017-02-07 12:45:00       NaN
2017-02-07 13:00:00       NaN
2017-02-07 13:15:00       NaN
2017-02-07 13:30:00  0.019717
2017-02-07 13:45:00       NaN
2017-02-07 14:00:00       NaN
2017-02-07 14:15:00       NaN
2017-02-07 14:30:00       NaN
2017-02-07 14:45:00       NaN
2017-02-07 15:00:00  0.068249
2017-02-07 15:15:00       NaN
2017-02-07 15:30:00       NaN
2017-02-07 15:45:00       NaN
2017-02-07 16:00:00       NaN
2017-02-07 16:15:00       NaN
2017-02-07 16:30:00       NaN
2017-02-07 16:45:00       NaN
2017-02-07 17:00:00       NaN
2017-02-07 17:15:00       NaN
2017-02-07 17:30:00       NaN
2017-02-07 17:45:00       NaN
2017-02-07 18:00:00       NaN
2017-02-07 18:15:00       NaN
2017-02-07 18:30:00  0.031485

so this:

df['Date'] = pd.to_datetime(df['Date'].dt.strftime('%H:%M:%S'))

what this does is to extract as a string just the time using dt.strftime, we can then use to_datetime to make a datetime64 series where all dates are the same

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download