Sida Zhou Sida Zhou - 3 months ago 24
Python Question

How to assign a sequential label to pandas groupby?

I start with the following pandas dataframe, I wish to group each day, and make a new column called 'label', which labels the group with a sequential number. How do I do this?

df = pd.DataFrame({'val': [10,40,30,10,11,13]}, index=pd.date_range('2016-01-01 00:00:00', periods=6, freq='12H' ) )
# df['label'] = df.groupby(pd.TimeGrouper('D')) # what do i do here???
print df


output:

val
2016-01-01 00:00:00 10
2016-01-01 12:00:00 40
2016-01-02 00:00:00 30
2016-01-02 12:00:00 10
2016-01-03 00:00:00 11
2016-01-03 12:00:00 13


desired output:

val label
2016-01-01 00:00:00 10 1
2016-01-01 12:00:00 40 1
2016-01-02 00:00:00 30 2
2016-01-02 12:00:00 10 2
2016-01-03 00:00:00 11 3
2016-01-03 12:00:00 13 3

Answer

Try this:

df = pd.DataFrame({'val': [10,40,30,10,11,13]}, index=pd.date_range('2016-01-01 00:00:00', periods=6, freq='12H' ) )

If you just want to group by date:

df['label'] = df.groupby(df.index.date).grouper.group_info[0] + 1
print(df)

To group by time more generally, you can use TimeGrouper:

df['label'] = df.groupby(pd.TimeGrouper('D')).grouper.group_info[0] + 1
print(df)

Both of the above should give you the following:

                      val  label
2016-01-01 00:00:00   10      1
2016-01-01 12:00:00   40      1
2016-01-02 00:00:00   30      2
2016-01-02 12:00:00   10      2
2016-01-03 00:00:00   11      3
2016-01-03 12:00:00   13      3

I think this is undocumented (or hard to find, at least). Check out:

Get group id back into pandas dataframe

for more discussion.

Comments