cavaunpeu cavaunpeu - 1 year ago 62
Python Question

How to split a Python generator of tuples into 2 separate generators?

I have a generator that is roughly as follows:

def gen1():
for x, y in enumerate(xrange(20)):
a = 5*x
b = 10*y
yield a, b

From this generator, I would like to create 2 separate generators as follows:

for a in gen1_split_a():
yield a

for b in gen1_split_b():
yield b

What's my play, SA?

Answer Source

You can't, not without ending up holding all generator output just to be able to produce b values in the second loop. That can get costly in terms of memory.

You'd use itertools.tee() to 'duplicate' the generator:

from itertools import tee

def split_gen(gen):
    gen_a, gen_b = tee(gen, 2)
    return (a for a, b in gen_a), (b for a, b in gen_b)

gen1_split_a, gen1_split_b = split_gen(gen1)

for a in gen1_split_a:
    print a

for b in gen1_split_b:
    print b

but what happens in this case is that the tee object will end up having to store everything gen1 produces. From the documentation:

This itertool may require significant auxiliary storage (depending on how much temporary data needs to be stored). In general, if one iterator uses most or all of the data before another iterator starts, it is faster to use list() instead of tee().

Following that advice, just put the b values into a list for the second loop:

b_values = []
for a, b in gen1():
    print a

for b in b_values:
    print b

or better yet, just process both a and b in the one loop.