ahrf - 1 year ago 63

Python Question

I have a pandas dataframe indexed by time: (using python 3.X)

`2012-01-01 00:00:00 38406`

2012-01-01 01:00:00 36501

2012-01-01 02:00:00 35305

...

2012-12-31 09:00:00 43121

2012-12-31 10:00:00 44549

2012-12-31 11:00:00 45635

All I need is a list of lists containing the consumption values in hourly resolution:

`data =[[ 38406, 36501, 35305,...], [ x, y, z,...], [ ],....[ ]]`

In other words: I need a list for each day: containing 24 values (one value for each hour of the day). And they should all be packed together in a list.

So data[0] would give me a list containing the 24 consumption values of the first day.

What i have done yet:

For one month instead of the whole year, it could look like this:

`clusterInput=[None]*31`

for i in range(31):

a="2012-1-"+str(i+1)

subset=data[a]

clusterInput[i]=subset.values

For the whole year it can be done similar with more than one for loop or a switch case statement to take the different months (28/29/30/31 days) into account.

But I'm pretty sure there has to be an easier way, due to the time index.

I have also tried to use but without success

`[list(x) for x in dt.T.iterrows()] /tuples and /items`

I'd be very glad for some hints how to do this efficiently

Answer Source

Given a Pandas DataFrame whose index is a `DatetimeIndex`

, you could use the `groupby`

method to group all rows whose index has the same date:

```
import pandas as pd
import numpy as np
dates = pd.date_range('2012-01-01', periods=24*3, freq='H')
values = np.random.randint(100, size=len(dates))
df = pd.DataFrame({'amount':values}, index=dates)
data = [grp['amount'].values.tolist() for key, grp in df.groupby([df.index.date])]
```

However, if you just want to group every 24 rows together, you could use the grouper recipe:

```
data = zip(*[iter(df['amount'])]*24)
```

This is quicker since it does not have to pay any attention to the index, but it does rely on there being exactly 24 rows for each day, and that the rows to be grouped are contiguous.

```
In [36]: %timeit zip(*[iter(df['amount'])]*24)
100000 loops, best of 3: 15.2 µs per loop
In [37]: %timeit [grp['amount'].values.tolist() for key, grp in df.groupby([df.index.date])]
1000 loops, best of 3: 1.19 ms per loop
In [38]: 1190/15.2
Out[38]: 78.28947368421053
```