mikeL - 1 year ago 90
Python Question

# Get average counts per minute by hour

I have a dataframe with a time stamp as the index and a column of labels

``````df=DataFrame({'time':[ datetime(2015,11,2,4,41,10),     datetime(2015,11,2,4,41,39), datetime(2015,11,2,4,41,47),
datetime(2015,11,2,4,41,59), datetime(2015,11,2,4,42,4),     datetime(2015,11,2,4,42,11),
datetime(2015,11,2,4,42,15), datetime(2015,11,2,4,42,30),     datetime(2015,11,2,4,42,39),
datetime(2015,11,2,4,42,41),datetime(2015,11,2,5,2,9),datetime(2015,11,2,    5,2,10),
datetime(2015,11,2,5,2,16),datetime(2015,11,2,5,2,29),datetime(2015,11,2,    5,2,51),
datetime(2015,11,2,5,9,1),datetime(2015,11,2,5,9,21),datetime(2015,11,2,5,9,31),
datetime(2015,11,2,5,9,40),datetime(2015,11,2,5,9,55)],
'Label':[2,0,0,0,1,0,0,1,1,1,1,3,0,0,3,0,1,0,1,1]}).set_index(['time'])
``````

I want to get the avergae number of times that a label appears in a distinct minute
in a distnct hour.

For example, Label 0 appears 3 times in hour 4 in minute 41, 2 times in hour 4
in minute 42,

2 times in hour 5 in in minute 2, and 2 times in hour 5 in minute 9 so its average count per
minute in hour 4 is

``````(2+3)/2=2.5
``````

and its count per minute in hour 5 is

``````(2+2)/2=2
``````

The output I am looking for is

``````Hour 1
Label  avg
0      2.5
1      2
2       .5
3       0

Hour 2
Label  avg
0      2
1      1.5
2      0
3      1
``````

What I have so far is

``````df['hour']=df.index.hour

hour_grp=df.groupby(['hour'], as_index=False)
``````

then I can deo something like

``````res=[]
for key, value in hour_grp:
res.append(value)
``````

then group by minute

``````res[0].groupby(pd.TimeGrouper('1Min'))['Label'].value_counts()
``````

but this is where I'm stuck, not to mention it is not very efficient

Accessing minute of DateTimeIndex:

``````mn = df.index.minute
``````

Accessing hour of DateTimeIndex:

``````hr = df.index.hour
``````

Perform `Groupby` w.r.t to them and compute `value_counts`. `unstack` by filling missing values with 0 and `groupby` w.r.t minute and compute it's mean thereafter.

``````df.groupby([mn,hr])['Label'].value_counts().unstack(fill_value=0).groupby(level=1).mean()
``````

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download