Adrian Padin Adrian Padin - 13 days ago 8
Python Question

Average every four rows but preserve timestamp values

What is the fastest way in pandas to average every four rows in a dataframe?

My problem is that I have a program recording data every 15 seconds which looks like this:

1477892758, 10
1477892773, 20
1477892788, 30
1477892803, 40
1477892818, 15
1477892833, 25
1477892848, 35
1477892863, 45


...etc where the first column is the Unix timestamp and the second is the data (my real data set has many more columns).

I only want to have data for once per minute. How can I average the data for every four rows but still preserve a meaningful timestamp? Ideally the timestamp would be the first in the series, so I would have something like this in the end:

1477892758, 25
1477892818, 30

Answer

You can use groupby by index floor divided by 4 and aggregate for first column first and for second mean:

df = pd.DataFrame({'B': {0: 10, 1: 20, 2: 30, 3: 40, 4: 15, 5: 25, 6: 35, 7: 45}, 'A': {0: 1477892758, 1: 1477892773, 2: 1477892788, 3: 1477892803, 4: 1477892818, 5: 1477892833, 6: 1477892848, 7: 1477892863}})
print (df)
            A   B
0  1477892758  10
1  1477892773  20
2  1477892788  30
3  1477892803  40
4  1477892818  15
5  1477892833  25
6  1477892848  35
7  1477892863  45
print (df.index // 4)
Int64Index([0, 0, 0, 0, 1, 1, 1, 1], dtype='int64')

print (df.groupby(df.index // 4).agg({'A':'first', 'B':'mean'})[['A','B']])
            A   B
0  1477892758  25
1  1477892818  30

If first column is index use reset_index first:

df = pd.DataFrame({'A': [10, 20, 30, 40, 15, 25, 35, 45]}, index=[1477892758, 1477892773, 1477892788, 1477892803, 1477892818, 1477892833, 1477892848, 1477892863])
print (df)
             A
1477892758  10
1477892773  20
1477892788  30
1477892803  40
1477892818  15
1477892833  25
1477892848  35
1477892863  45
df.reset_index(inplace=True)
df.columns = list('AB')
print (df)
            A   B
0  1477892758  10
1  1477892773  20
2  1477892788  30
3  1477892803  40
4  1477892818  15
5  1477892833  25
6  1477892848  35
7  1477892863  45


print (df.groupby(df.index // 4).agg({'A':'first', 'B':'mean'})[['A','B']])
            A   B
0  1477892758  25
1  1477892818  30
Comments