Neha B Neha B - 1 month ago 7
Python Question

Python : Calculate Average for each Hour in csv?

I want to calculate Average for each hours using csv file:

Below is my DATA SET :-

Timestamp Temperature
9/1/2016 0:00:08 53.8
9/1/2016 0:00:38 53.8
9/1/2016 0:01:08 53.8
9/1/2016 0:01:38 53.8
9/1/2016 0:02:08 53.8
9/1/2016 0:02:38 54.1
9/1/2016 0:03:08 54.1
9/1/2016 0:03:38 54.1
9/1/2016 0:04:38 54
9/1/2016 0:05:38 54
9/1/2016 0:06:08 54
9/1/2016 0:06:38 54
9/1/2016 0:07:08 54
9/1/2016 0:07:38 54
9/1/2016 0:08:08 54.1
9/1/2016 0:08:38 54.1
9/1/2016 0:09:38 54.1
9/1/2016 0:10:32 54
9/1/2016 0:11:02 54
9/1/2016 0:11:32 54
9/1/2016 0:00:08 54
9/2/2016 0:00:20 32
9/2/2016 0:00:50 32
9/2/2016 0:01:20 32
9/2/2016 0:01:50 32
9/2/2016 0:02:20 32
9/2/2016 0:02:50 32
9/2/2016 0:03:20 32
9/2/2016 0:03:50 32
9/2/2016 0:04:20 32
9/2/2016 0:04:50 32
9/2/2016 0:05:20 32
9/2/2016 0:05:50 32
9/2/2016 0:06:20 32
9/2/2016 0:06:50 32
9/2/2016 0:07:20 32
9/2/2016 0:07:50 32


Here my code for calculating per day average but I want per hour:

from datetime import datetime
import pandas
def same_day(date_string): # remove year
return datetime.strptime(date_string, "%m/%d/%Y %H:%M%S").strftime(%m%d')

df = pandas.read_csv('/home/kk/Desktop/cal_Avg.csv',index_col=0,usecols=[0, 1], names=['Timestamp', 'Discharge'],converters={'Timestamp': same_day})

print(df.groupby(level=0).mean())



My Desire Output is like:

Timestamp Temp * Avg

9/1/2016 0:00:08 53.8

9/1/2016 0:00:38 53.8 ?avg for this hour

9/1/2016 0:01:08 53.8

9/1/2016 0:01:38 53.8 ?avg for this hour

9/1/2016 0:02:08 53.8

9/1/2016 0:02:38 54.1

Now i want average for specific hours,Min

Desire output:-

here i am printing only 5 hours output for date 01-09-2016 and 02-09-16
010900 54.362727 45.497273

010901 54.723276 45.068103

010902 54.746847 45.370270

010903 54.833913 44.931304

010904 54.971053 44.835088

010905 55.519444 44.459259

020901 31.742553 55.640426

020902 31.495556 55.655556

020903 31.304348 55.442609

020904 31.200000 55.437273

020905 31.294382 55.442697


specific date and there specific hours.?
How to archive this?

Answer

I think you need first read_csv with parameters index_col=[0] for read first column to index and parse_dates=[0] for parse first column to DatetimeIndex:

df = pd.read_csv('filename', index_col=[0], parse_dates=[0],, usecols=[0,1])
print (df)
                     Temperature
Timestamp                       
2016-09-01 00:00:08         53.8
2016-09-01 00:00:38         53.8
2016-09-01 00:01:08         53.8
2016-09-01 00:01:38         53.8
2016-09-01 00:02:08         53.8
2016-09-01 00:02:38         54.1
2016-09-01 00:03:08         54.1
...
...

Then use resample by hours and aggregate Resampler.mean, but get NaN for missing data in DatetimeIndex:

print (df.resample('H').mean())
                     Temperature
Timestamp                       
2016-09-01 00:00:00    53.980952
2016-09-01 01:00:00          NaN
2016-09-01 02:00:00          NaN
2016-09-01 03:00:00          NaN
2016-09-01 04:00:00          NaN
2016-09-01 05:00:00          NaN
2016-09-01 06:00:00          NaN
2016-09-01 07:00:00          NaN
2016-09-01 08:00:00          NaN
2016-09-01 09:00:00          NaN
2016-09-01 10:00:00          NaN
2016-09-01 11:00:00          NaN
2016-09-01 12:00:00          NaN
2016-09-01 13:00:00          NaN
2016-09-01 14:00:00          NaN
2016-09-01 15:00:00          NaN
2016-09-01 16:00:00          NaN
2016-09-01 17:00:00          NaN
2016-09-01 18:00:00          NaN
2016-09-01 19:00:00          NaN
2016-09-01 20:00:00          NaN
2016-09-01 21:00:00          NaN
2016-09-01 22:00:00          NaN
2016-09-01 23:00:00          NaN
2016-09-02 00:00:00    32.000000

Another solution is remove minutes and seconds by casting to hours and groupby by this array:

print (df.index.values.astype('<M8[h]'))
['2016-09-01T00' '2016-09-01T00' '2016-09-01T00' '2016-09-01T00'
 '2016-09-01T00' '2016-09-01T00' '2016-09-01T00' '2016-09-01T00'
 '2016-09-01T00' '2016-09-01T00' '2016-09-01T00' '2016-09-01T00'
 '2016-09-01T00' '2016-09-01T00' '2016-09-01T00' '2016-09-01T00'
 '2016-09-01T00' '2016-09-01T00' '2016-09-01T00' '2016-09-01T00'
 '2016-09-01T00' '2016-09-02T00' '2016-09-02T00' '2016-09-02T00'
 '2016-09-02T00' '2016-09-02T00' '2016-09-02T00' '2016-09-02T00'
 '2016-09-02T00' '2016-09-02T00' '2016-09-02T00' '2016-09-02T00'
 '2016-09-02T00' '2016-09-02T00' '2016-09-02T00' '2016-09-02T00'
 '2016-09-02T00']

print (df.groupby([df.index.values.astype('<M8[h]')]).mean())
            Temperature
2016-09-01    53.980952
2016-09-02    32.000000

Also if need meean by months, days and hours is posible groupby by DatetimeIndex.strftime:

print (df.index.strftime('%m%d%H'))
['090100' '090100' '090100' '090100' '090100' '090100' '090100' '090100'
 '090100' '090100' '090100' '090100' '090100' '090100' '090100' '090100'
 '090100' '090100' '090100' '090100' '090100' '090200' '090200' '090200'
 '090200' '090200' '090200' '090200' '090200' '090200' '090200' '090200'
 '090200' '090200' '090200' '090200' '090200']

print (df.groupby([df.index.strftime('%m%d%H')]).mean())
        Temperature
090100    53.980952
090200    32.000000

Or if need mean only by hours groupby by DatetimeIndex.hour:

print (df.index.hour)
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]

print (df.groupby([df.index.hour]).mean())
   Temperature
0    44.475676