Lukasz Lukasz - 2 months ago 7
Python Question

Pandas Grouping By Datetime

I'm attempting to count the number of users that login to a system on an hourly basis on a given date. The date I have resembles:

df=

Name Date
name_1 2012-07-12 22:20:00
name_1 2012-07-16 22:19:00
name_1 2013-12-16 17:50:00
...
name_2 2010-01-11 19:54:00
name_2 2010-02-06 12:10:00
...
name_2 2012-07-18 22:12:00
...
name_5423 2013-11-23 10:21:00


since I'm not interested in the users name I've deleted that column. I manage to create a grouped data structure and a new dataframe
df2
using the following command

grp = df.groupby(by=[df.Date.map(lambda x: (x.year, x.month, x.day, x.hour))])
df2 = pd.DataFrame({'Count' : grp.size()}).reset_index()


grp
converts the
datetime
type into a tuple of
(year, month, day, hour)
.

I'm able to convert it back to a
datetime
type using a
for
loop

for i in range(len(df2)):
proper_date = datetime.datetime(*df2['Date'][i])
df2.set_value(i, 'Date', proper_date)


What I'm wondering is if there is a better/more efficient way of going about this?

Answer

You can groupby by column Date converted to h and aggregate size:

print (df.Date.values.astype('datetime64[h]'))
['2012-07-12T22+0200' '2012-07-16T22+0200' '2013-12-16T17+0100'
 '2010-01-11T19+0100' '2010-02-06T12+0100' '2012-07-18T22+0200'
 '2013-11-23T10+0100']

print (df.Name.groupby([df.Date.values.astype('datetime64[h]')]).size())
2010-01-11 19:00:00    1
2010-02-06 12:00:00    1
2012-07-12 22:00:00    1
2012-07-16 22:00:00    1
2012-07-18 22:00:00    1
2013-11-23 10:00:00    1
2013-12-16 17:00:00    1
dtype: int64

Another solution:

print (df.Date.values.astype('<M8[h]'))
['2012-07-12T22+0200' '2012-07-16T22+0200' '2013-12-16T17+0100'
 '2010-01-11T19+0100' '2010-02-06T12+0100' '2012-07-18T22+0200'
 '2013-11-23T10+0100']

print (df.Name.groupby([df.Date.values.astype('<M8[h]')]).size())
2010-01-11 19:00:00    1
2010-02-06 12:00:00    1
2012-07-12 22:00:00    1
2012-07-16 22:00:00    1
2012-07-18 22:00:00    1
2013-11-23 10:00:00    1
2013-12-16 17:00:00    1
dtype: int64
Comments