RussW - 1 month ago 10

Python Question

I'm trying to get n random and non-overlapping slices of a sequence where each subsequence is of length l, preferably in the order they appear.

This is the code I have so far and it's gotten more and more messy with each attempt to make it work, needless to say it doesn't work.

`def rand_parts(seq, n, l):`

"""

return n random non-overlapping partitions each of length l.

If n * l > len(seq) raise error.

"""

if n * l > len(seq):

raise Exception('length of seq too short for given n, l arguments')

if not isinstance(seq, list):

seq = list(seq)

gaps = [0] * (n + 1)

for g in xrange(len(seq) - (n * l)):

gaps[random.randint(0, len(gaps) - 1)] += 1

result = []

for i, g in enumerate(gaps):

x = g + (i * l)

result.append(seq[x:x+l])

if i < len(gaps) - 1:

gaps[i] += x

return result

For example if we say

`rand_parts([1, 2, 3, 4, 5, 6], 2, 2)`

`[1, 2, 3, 4, 5, 6]`

____ ____

[1, 2, 3, 4, 5, 6]

____ ____

[1, 2, 3, 4, 5, 6]

____ ____

[1, 2, 3, 4, 5, 6]

____ ____

[1, 2, 3, 4, 5, 6]

____ ____

[1, 2, 3, 4, 5, 6]

____ ____

So

`[[3, 4], [5, 6]]`

`[[3, 4], [4, 5]]`

`[[2, 4], [5, 6]]`

`[2, 4]`

I encountered this problem while doing a little code golfing so for interests sake it would also be nice to see both a simple solution and/or an efficient one, not so much interested in my existing code.

Answer Source

```
def rand_parts(seq, n, l):
indices = xrange(len(seq) - (l - 1) * n)
result = []
offset = 0
for i in sorted(random.sample(indices, n)):
i += offset
result.append(seq[i:i+l])
offset += l - 1
return result
```

To understand this, first consider the case `l == 1`

. Then it's basically just returning a `random.sample()`

of the input data in sorted order; in this case the `offset`

variable is always 0.

The case where `l > 1`

is an extension of the previous case. We use `random.sample()`

to pick up positions, but maintain an `offset`

to shift successive results: in this way, we make sure that they are non-overlapping ranges --- i.e. they start at a distance of at least `l`

of each other, rather than 1.