Python Question

# Average every four rows but preserve timestamp values

What is the fastest way in pandas to average every four rows in a dataframe?

My problem is that I have a program recording data every 15 seconds which looks like this:

``````1477892758, 10
1477892773, 20
1477892788, 30
1477892803, 40
1477892818, 15
1477892833, 25
1477892848, 35
1477892863, 45
``````

...etc where the first column is the Unix timestamp and the second is the data (my real data set has many more columns).

I only want to have data for once per minute. How can I average the data for every four rows but still preserve a meaningful timestamp? Ideally the timestamp would be the first in the series, so I would have something like this in the end:

``````1477892758, 25
1477892818, 30
``````

You can use `groupby` by `index` floor divided by `4` and `aggregate` for first column `first` and for second `mean`:

``````df = pd.DataFrame({'B': {0: 10, 1: 20, 2: 30, 3: 40, 4: 15, 5: 25, 6: 35, 7: 45}, 'A': {0: 1477892758, 1: 1477892773, 2: 1477892788, 3: 1477892803, 4: 1477892818, 5: 1477892833, 6: 1477892848, 7: 1477892863}})
print (df)
A   B
0  1477892758  10
1  1477892773  20
2  1477892788  30
3  1477892803  40
4  1477892818  15
5  1477892833  25
6  1477892848  35
7  1477892863  45
``````
``````print (df.index // 4)
Int64Index([0, 0, 0, 0, 1, 1, 1, 1], dtype='int64')

print (df.groupby(df.index // 4).agg({'A':'first', 'B':'mean'})[['A','B']])
A   B
0  1477892758  25
1  1477892818  30
``````

If first column is `index` use `reset_index` first:

``````df = pd.DataFrame({'A': [10, 20, 30, 40, 15, 25, 35, 45]}, index=[1477892758, 1477892773, 1477892788, 1477892803, 1477892818, 1477892833, 1477892848, 1477892863])
print (df)
A
1477892758  10
1477892773  20
1477892788  30
1477892803  40
1477892818  15
1477892833  25
1477892848  35
1477892863  45
``````
``````df.reset_index(inplace=True)
df.columns = list('AB')
print (df)
A   B
0  1477892758  10
1  1477892773  20
2  1477892788  30
3  1477892803  40
4  1477892818  15
5  1477892833  25
6  1477892848  35
7  1477892863  45

print (df.groupby(df.index // 4).agg({'A':'first', 'B':'mean'})[['A','B']])
A   B
0  1477892758  25
1  1477892818  30
``````
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download