Tom V - 2 years ago 157
Python Question

# Way of easily finding the average of every nth element over a window of size k in a pandas.Series? (not the rolling mean)

The motivation here is to take a time series and get the average activity throughout a sub-period (day, week).

It is possible to reshape an array and take the mean over the y axis to achieve this, similar to this answer (but using axis=2):

Averaging over every n elements of a numpy array

but I'm looking for something which can handle arrays of length N%k != 0 and does not solve the issue by reshaping and padding with ones or zeros (e.g numpy.resize), i.e takes the average over the existing data only.

`[2,2,3,2,2,3,2,2,3,6]`
of length N=10 which is not divisible by k=3. What I want is to take the average over columns of a reshaped array with mis-matched dimensions:

```In: [[2,2,3], [2,2,3], [2,2,3], [6]], k =3```

`Out: [3,2,3]`

```In: [[2,2,3], [2,2,3], [2,2,3], [6,0,0]], k =3```

`Out: [3,1.5,2.25]`

Thank you.

You can easily do it by padding, reshaping and calculating by how many elements to divide each row:

``````>>> import numpy as np
>>> a = np.array([2,2,3,2,2,3,2,2,3,6])
>>> k = 3
``````

``````>>> b = np.pad(a, (0, k - a.size%k), mode='constant').reshape(-1, k)
>>> b
array([[2, 2, 3],
[2, 2, 3],
[2, 2, 3],
[6, 0, 0]])
``````

``````>>> c = a.size // k # 3
>>> d = (np.arange(k) + c * k) < a.size # [True, False, False]
``````

The first part of `d` will create an array that contains `[9, 10, 11]`, and compare it to the size of `a` (10), generating the mentioned boolean mask.

And divide it:

``````>>> b.sum(0) / (c + 1.0 * d)
array([ 3.,  2.,  3.])
``````

The above will divide the first column by 4 (`c + 1 * True`) and the rest by 3. This is vectorized numpy, thus, it scales very well to large arrays.

Everything can be written shorter, I just show all the steps to make it more clear.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download