Wasswa Samuel Wasswa Samuel -4 years ago 317
Python Question

Pandas timeseries groupby using TimeGrouper

I have a time series which is like this

Time Demand
Date
2014-01-01 0:00 2899.0
2014-01-01 0:15 2869.0
2014-01-01 0:30 2827.0
2014-01-01 0:45 2787.0
2014-01-01 1:00 2724.0
2014-01-01 1:15 2687.0
2014-01-01 1:30 2596.0
2014-01-01 1:45 2543.0
2014-01-01 2:00 2483.0


Its is in 15 minute increments. I want the average for every hour of everyday.So i tried something like this
df.groupby(pd.TimeGrouper(freq='H')).mean()
. It didn't work out quite right because it returned mostly
NaNs
.

Now my dataset has data like this for the whole year and I would like to calculate the mean for all the hours of all the months such that I have 24 points but the mean is for all hours of the year e.g. the first hour get the mean of the first hour for all the months. The expected output would be

2014 00:00:00 2884.0
2014 01:00:00 2807.0
2014 02:00:00 2705.5
2014 03:00:00 2569.5
..........
2014 23:00:00 2557.5


How can I achieve this?

Answer Source

I think you need first add Time column to index:

df.index = df.index + pd.to_timedelta(df.Time + ':00')
print (df)
                     Time  Demand
2014-01-01 00:00:00  0:00  2899.0
2014-01-01 00:15:00  0:15  2869.0
2014-01-01 00:30:00  0:30  2827.0
2014-01-01 00:45:00  0:45  2787.0
2014-01-01 01:00:00  1:00  2724.0
2014-01-01 01:15:00  1:15  2687.0
2014-01-01 01:30:00  1:30  2596.0
2014-01-01 01:45:00  1:45  2543.0
2014-01-01 02:00:00  2:00  2483.0

print (df.groupby(pd.Grouper(freq='H')).mean())
#same as
#print (df.groupby(pd.TimeGrouper(freq='H')).mean())
                     Demand
2014-01-01 00:00:00  2845.5
2014-01-01 01:00:00  2637.5
2014-01-01 02:00:00  2483.0

Thanks pansen for another idea resample:

print (df.resample("H").mean())
                     Demand
2014-01-01 00:00:00  2845.5
2014-01-01 01:00:00  2637.5
2014-01-01 02:00:00  2483.0

EDIT:

print (df)
            Time  Demand
Date                    
2014-01-01  0:00     1.0
2014-01-01  0:15     2.0
2014-01-01  0:30     4.0
2014-01-01  0:45     5.0
2014-01-01  1:00     1.0
2014-01-01  1:15     0.0
2015-01-01  1:30     1.0
2015-01-01  1:45     2.0
2015-01-01  2:00     3.0

df.index = df.index + pd.to_timedelta(df.Time + ':00')
print (df)
                     Time  Demand
2014-01-01 00:00:00  0:00     1.0
2014-01-01 00:15:00  0:15     2.0
2014-01-01 00:30:00  0:30     4.0
2014-01-01 00:45:00  0:45     5.0
2014-01-01 01:00:00  1:00     1.0
2014-01-01 01:15:00  1:15     0.0
2015-01-01 01:30:00  1:30     1.0
2015-01-01 01:45:00  1:45     2.0
2015-01-01 02:00:00  2:00     3.0

df1 = df.groupby([df.index.year, df.index.hour]).mean().reset_index()
df1.columns = ['year','hour','Demand']
print (df1)
   year  hour  Demand
0  2014     0     3.0
1  2014     1     0.5
2  2015     1     1.5
3  2015     2     3.0
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download