I have a pandas dataframe indexed by time: (using python 3.X)
2012-01-01 00:00:00 38406
2012-01-01 01:00:00 36501
2012-01-01 02:00:00 35305
...
2012-12-31 09:00:00 43121
2012-12-31 10:00:00 44549
2012-12-31 11:00:00 45635
data =[[ 38406, 36501, 35305,...], [ x, y, z,...], [ ],....[ ]]
clusterInput=[None]*31
for i in range(31):
a="2012-1-"+str(i+1)
subset=data[a]
clusterInput[i]=subset.values
[list(x) for x in dt.T.iterrows()] /tuples and /items
Given a Pandas DataFrame whose index is a DatetimeIndex
, you could use the groupby
method to group all rows whose index has the same date:
import pandas as pd
import numpy as np
dates = pd.date_range('2012-01-01', periods=24*3, freq='H')
values = np.random.randint(100, size=len(dates))
df = pd.DataFrame({'amount':values}, index=dates)
data = [grp['amount'].values.tolist() for key, grp in df.groupby([df.index.date])]
However, if you just want to group every 24 rows together, you could use the grouper recipe:
data = zip(*[iter(df['amount'])]*24)
This is quicker since it does not have to pay any attention to the index, but it does rely on there being exactly 24 rows for each day, and that the rows to be grouped are contiguous.
In [36]: %timeit zip(*[iter(df['amount'])]*24)
100000 loops, best of 3: 15.2 µs per loop
In [37]: %timeit [grp['amount'].values.tolist() for key, grp in df.groupby([df.index.date])]
1000 loops, best of 3: 1.19 ms per loop
In [38]: 1190/15.2
Out[38]: 78.28947368421053