RussW - 1 month ago 10
Python Question

# N random, contiguous and non-overlapping subsequences each of length

I'm trying to get n random and non-overlapping slices of a sequence where each subsequence is of length l, preferably in the order they appear.

This is the code I have so far and it's gotten more and more messy with each attempt to make it work, needless to say it doesn't work.

``````def rand_parts(seq, n, l):
"""
return n random non-overlapping partitions each of length l.
If n * l > len(seq) raise error.
"""
if n * l > len(seq):
raise Exception('length of seq too short for given n, l arguments')
if not isinstance(seq, list):
seq = list(seq)
gaps = [0] * (n + 1)
for g in xrange(len(seq) - (n * l)):
gaps[random.randint(0, len(gaps) - 1)] += 1
result = []
for i, g in enumerate(gaps):
x = g + (i * l)
result.append(seq[x:x+l])
if i < len(gaps) - 1:
gaps[i] += x
return result
``````

For example if we say
`rand_parts([1, 2, 3, 4, 5, 6], 2, 2)`
there are 6 possible results that it could return from the following diagram:

``````[1, 2, 3, 4, 5, 6]
____  ____

[1, 2, 3, 4, 5, 6]
____     ____

[1, 2, 3, 4, 5, 6]
____        ____

[1, 2, 3, 4, 5, 6]
____  ____

[1, 2, 3, 4, 5, 6]
____     ____

[1, 2, 3, 4, 5, 6]
____  ____
``````

So
`[[3, 4], [5, 6]]`
would be acceptable but
`[[3, 4], [4, 5]]`
wouldn't because it's overlapping and
`[[2, 4], [5, 6]]`
also wouldn't because
`[2, 4]`
isn't contiguous.

I encountered this problem while doing a little code golfing so for interests sake it would also be nice to see both a simple solution and/or an efficient one, not so much interested in my existing code.

``````def rand_parts(seq, n, l):
indices = xrange(len(seq) - (l - 1) * n)
result = []
offset = 0
for i in sorted(random.sample(indices, n)):
i += offset
result.append(seq[i:i+l])
offset += l - 1
return result
``````

To understand this, first consider the case `l == 1`. Then it's basically just returning a `random.sample()` of the input data in sorted order; in this case the `offset` variable is always 0.

The case where `l > 1` is an extension of the previous case. We use `random.sample()` to pick up positions, but maintain an `offset` to shift successive results: in this way, we make sure that they are non-overlapping ranges --- i.e. they start at a distance of at least `l` of each other, rather than 1.