Paul Paul - 2 months ago 6
Python Question

Compute all ways to bin a series of integers into N bins, where each bin only contains contiguous numbers

I want find all possible ways to map a series of (contiguous) integers M = {0,1,2,...,m} to another series of integers N = {0,1,2,...,n} where m > n, subject to the constraint that only contiguous integers in M map to the same integer in N.

The following piece of python code comes close (

start
corresponds to the first element in M,
stop
-1 corresponds to the last element in M, and
nbins
corresponds to |N|):

import itertools
def find_bins(start, stop, nbins):
if (nbins > 1):
return list(list(itertools.product([range(start, ii)], find_bins(ii, stop, nbins-1))) for ii in range(start+1, stop-nbins+2))
else:
return [range(start, stop)]


E.g

In [20]: find_bins(start=0, stop=5, nbins=3)
Out[20]:
[[([0], [([1], [2, 3, 4])]),
([0], [([1, 2], [3, 4])]),
([0], [([1, 2, 3], [4])])],
[([0, 1], [([2], [3, 4])]),
([0, 1], [([2, 3], [4])])],
[([0, 1, 2], [([3], [4])])]]


However, as you can see the output is nested, and for the life of me, I cant find a way to properly amend the code without breaking it.

The desired output would look like this:

In [20]: find_bins(start=0, stop=5, nbins=3)
Out[20]:
[[(0), (1), (2, 3, 4)],
[(0), (1, 2), (3, 4)],
[(0), (1, 2, 3), (4)],
[(0, 1), (2), (3, 4)],
[(0, 1), (2, 3), (4)],
[(0, 1, 2), (3), (4)]]

Answer

I suggest a different approach: a partitioning into n non-empty bins is uniquely determined by the n-1 distinct indices marking the boundaries between the bins, where the first marker is after the first element, and the final marker before the last element. itertools.combinations() can be used directly to generate all such index tuples, and then it's just a matter of using them as slice indices. Like so:

def find_nbins(start, stop, nbins):
    from itertools import combinations
    base = range(start, stop)
    nbase = len(base)
    for ixs in combinations(range(1, stop - start), nbins - 1):
        yield [tuple(base[lo: hi])
               for lo, hi in zip((0,) + ixs, ixs + (nbase,))]

Then, e.g.,

for x in find_nbins(0, 5, 3):
    print(x)

displays:

[(0,), (1,), (2, 3, 4)]
[(0,), (1, 2), (3, 4)]
[(0,), (1, 2, 3), (4,)]
[(0, 1), (2,), (3, 4)]
[(0, 1), (2, 3), (4,)]
[(0, 1, 2), (3,), (4,)]

EDIT: Making it into 2 problems

Just noting that there's a more general underlying problem here: generating the ways to break an arbitrary sequence into n non-empty bins. Then the specific question here is applying that to the sequence range(start, stop). I believe viewing it that way makes the code easier to understand, so here it is:

def gbins(seq, nbins):
    from itertools import combinations
    base = tuple(seq)
    nbase = len(base)
    for ixs in combinations(range(1, nbase), nbins - 1):
        yield [base[lo: hi]
               for lo, hi in zip((0,) + ixs, ixs + (nbase,))]

def find_nbins(start, stop, nbins):
    return gbins(range(start, stop), nbins)
Comments