justin justin - 29 days ago 8
Python Question

summing all possible combinations of an arbitrary number of arrays and applying limits

I am trying to generate an array of all combinations of an arbitrary number of arrays. From the generated array, I would like to add an constraint that the sum of the numbers must lie between two bounds (say 'lower' and 'upper')

One method of doing this is to use cartersian, sum the elements, and select the ones that fall within the lower and upper bounds. However, the main limitation is that it is possible to run out of memory given a large number of input arrays. Another method is to use itertools.product:

import itertools
import numpy as np

def arraysums(arrays,lower,upper):
p = itertools.product(*arrays)
r = list()

for n in p:
s = sum(n)
if lower <= s <= upper:
r.append(n)

return r

N = 8
a = np.arange(N)
b = np.arange(N)-N/2

arraysums((a,b),lower=5,upper=6)


which returns a result like this:

[(2, 3),
(3, 2),
(3, 3),
(4, 1),
(4, 2),
(5, 0),
(5, 1),
(6, -1),
(6, 0),
(7, -2),
(7, -1)]


This method is memory efficient, but can be very slow if the arrays are large, such as this example which runs in the 10's of minutes:

a = np.arange(32.)
arraysums(6*(a,),lower=10,upper=20)


I'm looking for a faster method.

Answer

You could use recursion. The main advantage here is that you can short-circuit the enumeration of tuples at each stage.

For example, if item has been selected from the first array, then the new lower and upper limits for the rest of the arrays should be lower-item and upper-item, and you can automatically throw out any value in the other arrays that is bigger than upper-item. This intelligently reduces the size of your search space at each level of the recursion.

import itertools
def arraysums_recursive(arrays, lower, upper):
    if len(arrays) <= 1:
        result = [(item,) for item in arrays[0] if lower <= item <= upper]
    else:
        result = []
        for item in arrays[0]:
            subarrays = [[item2 for item2 in arr if item2 <= upper-item] 
                      for arr in arrays[1:]]
            result.extend(
                [(item,)+tup for tup in arraysums(subarrays, lower-item, upper-item)])
    return result

def arraysums(arrays,lower,upper):
    p = itertools.product(*arrays)
    r = list()

    for n in p:
        s = sum(n)
        if lower <= s <= upper:
            r.append(n)

    return r

a = list(range(32))

For this test case, arraysums_recursive is over 67x faster than arraysums:

In [70]: %time arraysums_recursive(6*(a,),lower=10,upper=20)
CPU times: user 3.67 s, sys: 0 ns, total: 3.67 s
Wall time: 3.68 s

In [73]: %time arraysums(6*(a,),lower=10,upper=20)
CPU times: user 4min 8s, sys: 0 ns, total: 4min 8s
Wall time: 4min 8s
Comments