muon - 1 year ago 176
Python Question

# numpy cumsum columns for varying lengths specified by list

how to cumsum

n
consecutive elements in each column specified by a list. cumsum resets from next row onwards. For example, lens = [2,3] ==> cumsum of first 2 rows [:2], then cumsum for next 3 rows [2:5]

import numpy as np
lens = [3, 2]
a = np.array(
[[ 1, 2],
[ 1, 2],
[ 1, 2],
[ 1, 2],
[ 1, 2]])

giving

np.array(
[[ 1, 2],
[ 2, 4],
[ 3, 6],
[ 1, 2],
[ 2, 4]])

trying to avoid loops

One option is split the array, cumsum and then combine them:

np.concatenate(list(map(lambda a: np.cumsum(a, axis=0), np.array_split(a, np.cumsum(lens)))))
#array([[1, 2],
#       [2, 4],
#       [3, 6],
#       [1, 2],
#       [2, 4]], dtype=int32)

Another option without split and combine is to create an auxiliary array that reset the sum at specific index like below:

idx = np.cumsum([0] + lens)[:-1]
aux = np.zeros_like(a)
(a + aux).cumsum(0)

#array([[1, 2],
#       [2, 4],
#       [3, 6],
#       [1, 2],
#       [2, 4]], dtype=int32)

The two methods are about the same speed:

def split_concat(a):
return np.concatenate(list(map(lambda a: np.cumsum(a, axis=0), np.array_split(a, np.cumsum(lens)))))

def reset_sum(a):
idx = np.cumsum([0] + lens)[:-1]
aux = np.zeros_like(a)
return (a + aux).cumsum(0)

lens = np.arange(1000)
a = np.ones((lens.sum(), 2))
(reset_sum(a) == split_concat(a)).all()
# True

%timeit split_concat(a)
# 12.8 ms ± 35.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%timeit reset_sum(a)
# 13.6 ms ± 87.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download