Huey Huey - 3 months ago 7
Python Question

Pandas group_by date and resample

I have some data frame that looks like this:

A B C date
0 J Y 2 2013-02-01 14:21:02.070030
1 X X 0 2013-02-01 15:49:33.110849
2 Y D 9 2013-02-01 06:47:19.369514
3 Y C 17 2013-02-01 08:56:11.751781
4 3 J 21 2013-02-01 14:19:12.017232


I'd like to group by date and then count, but omit the information about the hours, minutes, seconds, etc.

It seems like something like this works:

df.set_index('date').resample('D').count()


Two questions:


  1. Why does that work? Is that the right way?

  2. Why doesn't something like
    df.group_by('date').resample('D').count()
    work?


Answer

resample is in some sense just a special case of groupby - rather than grouping on distinct values, which is what grouppy('date') would do, it groups a time-based transformation of the index, which is why you need to set the index. Alternatively, you could do:

df.groupby(pd.Grouper(key='date', freq='D')).count()

In the upcoming version 0.19.0 you'll be able to write the above like this.

df.resample('D', on='date').count()