ahrf ahrf - 1 year ago 49
Python Question

How to generate daily lists from pandas time series

I have a pandas dataframe indexed by time: (using python 3.X)

2012-01-01 00:00:00 38406
2012-01-01 01:00:00 36501
2012-01-01 02:00:00 35305


2012-12-31 09:00:00 43121
2012-12-31 10:00:00 44549
2012-12-31 11:00:00 45635

All I need is a list of lists containing the consumption values in hourly resolution:

data =[[ 38406, 36501, 35305,...], [ x, y, z,...], [ ],....[ ]]

In other words: I need a list for each day: containing 24 values (one value for each hour of the day). And they should all be packed together in a list.

So data[0] would give me a list containing the 24 consumption values of the first day.

What i have done yet:

For one month instead of the whole year, it could look like this:

for i in range(31):

For the whole year it can be done similar with more than one for loop or a switch case statement to take the different months (28/29/30/31 days) into account.

But I'm pretty sure there has to be an easier way, due to the time index.
I have also tried to use but without success

[list(x) for x in dt.T.iterrows()] /tuples and /items

I'd be very glad for some hints how to do this efficiently


Given a Pandas DataFrame whose index is a DatetimeIndex, you could use the groupby method to group all rows whose index has the same date:

import pandas as pd
import numpy as np

dates = pd.date_range('2012-01-01', periods=24*3, freq='H')
values = np.random.randint(100, size=len(dates))
df = pd.DataFrame({'amount':values}, index=dates)

data = [grp['amount'].values.tolist() for key, grp in df.groupby([df.index.date])]

However, if you just want to group every 24 rows together, you could use the grouper recipe:

data = zip(*[iter(df['amount'])]*24)

This is quicker since it does not have to pay any attention to the index, but it does rely on there being exactly 24 rows for each day, and that the rows to be grouped are contiguous.

In [36]: %timeit zip(*[iter(df['amount'])]*24)
100000 loops, best of 3: 15.2 ┬Ás per loop

In [37]: %timeit [grp['amount'].values.tolist() for key, grp in df.groupby([df.index.date])]
1000 loops, best of 3: 1.19 ms per loop

In [38]: 1190/15.2
Out[38]: 78.28947368421053