kjo kjo - 3 months ago 8
Python Question

Simple idiom to break an n-long list into k-long chunks, when n % k > 0?

In Python, it is easy to break an n-long list into k-size chunks if n is a multiple of k (IOW,

n % k == 0
). Here's my favorite approach (straight from the docs):

>>> k = 3
>>> n = 5 * k
>>> x = range(k * 5)
>>> zip(*[iter(x)] * k)
[(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, 11), (12, 13, 14)]


(The trick is that
[iter(x)] * k
produces a list of k references to the same iterator, as returned by
iter(x)
. Then
zip
generates each chunk by calling each of the k copies of the iterator exactly once. The
*
before
[iter(x)] * k
is necessary because
zip
expects to receive its arguments as "separate" iterators, rather than a list of them.)

The main shortcoming I see with this idiom is that, when n is not a multiple of k (IOW,
n % k > 0
), the left over entries are just left out; e.g.:

>>> zip(*[iter(x)] * (k + 1))
[(0, 1, 2, 3), (4, 5, 6, 7), (8, 9, 10, 11)]


There's an alternative idiom that is slightly longer to type, produces the same result as the one above when
n % k == 0
, and has a more acceptable behavior when
n % k > 0
:

>>> map(None, *[iter(x)] * k)
[(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, 11), (12, 13, 14)]
>>> map(None, *[iter(x)] * (k + 1))
[(0, 1, 2, 3), (4, 5, 6, 7), (8, 9, 10, 11), (12, 13, 14, None)]


At least, here the left over entries are retained, but the last chunk gets padded with
None
. If one just wants a different value for the padding, then
itertools.izip_longest
solves the problem.

But suppose the desired solution is one in which the last chunk is left unpadded, i.e.

[(0, 1, 2, 3), (4, 5, 6, 7), (8, 9, 10, 11), (12, 13, 14)]


Is there a simple way to modify the
map(None, *[iter(x)]*k)
idiom to produce this result?

(Granted, it is not difficult to solve this problem by writing a function (see, for example, the many fine replies to How do you split a list into evenly sized chunks in Python? or What is the most "pythonic" way to iterate over a list in chunks?). Therefore, a more accurate title for this question would be "How to salvage the
map(None, *[iter(x)]*k)
idiom?", but I think it would baffle a lot of readers.)

I was struck by how easy it is to break a list into even-sized chunks, and how difficult (in comparison!) it is to get rid of the unwanted padding, even though the two problems seem of comparable complexity.

Answer
[x[i:i+k] for i in range(0,n,k)]
Comments