The motivation here is to take a time series and get the average activity throughout a sub-period (day, week).
It is possible to reshape an array and take the mean over the y axis to achieve this, similar to this answer (but using axis=2):
Averaging over every n elements of a numpy array
but I'm looking for something which can handle arrays of length N%k != 0 and does not solve the issue by reshaping and padding with ones or zeros (e.g numpy.resize), i.e takes the average over the existing data only.
E.g Start with a sequence
], k =3
[6,0,0]], k =3
You can easily do it by padding, reshaping and calculating by how many elements to divide each row:
>>> import numpy as np >>> a = np.array([2,2,3,2,2,3,2,2,3,6]) >>> k = 3
>>> b = np.pad(a, (0, k - a.size%k), mode='constant').reshape(-1, k) >>> b array([[2, 2, 3], [2, 2, 3], [2, 2, 3], [6, 0, 0]])
Then create a mask:
>>> c = a.size // k # 3 >>> d = (np.arange(k) + c * k) < a.size # [True, False, False]
The first part of
d will create an array that contains
[9, 10, 11], and compare it to the size of
a (10), generating the mentioned boolean mask.
And divide it:
>>> b.sum(0) / (c + 1.0 * d) array([ 3., 2., 3.])
The above will divide the first column by 4 (
c + 1 * True) and the rest by 3. This is vectorized numpy, thus, it scales very well to large arrays.
Everything can be written shorter, I just show all the steps to make it more clear.