anderspitman anderspitman - 4 months ago 15
Python Question

Generator not closing over data as expected

Sorry if the title is poorly worded, I'm not sure how to phrase it. I have a function that basically iterates over the 2nd dimension of a 2 dimensional iterable. The following is a simple reproduction:

words = ['ACGT', 'TCGA']

def make_lists():
for i in range(len(words[0])):
iter_ = iter([word[i] for word in words])
yield iter_

lists = list(make_lists())

for list_ in lists:
print(list(list_))


Running this outputs:

['A', 'T']
['C', 'C']
['G', 'G']
['T', 'A']


I would like to yield generators instead of having to evaluate
words
, in case
words
is very long, so I tried the following:

words = ['ACGT', 'TCGA']

def make_generators():
for i in range(len(words[0])):
gen = (word[i] for word in words)
yield gen

generators = list(make_iterator())

for gen in generators:
print(list(gen))


However, running outputs:

['T', 'A']
['T', 'A']
['T', 'A']
['T', 'A']


I'm not sure exactly what's happening. I suspect it has something to do with the generator comprehension not closing over its scope when yielded, so they're all sharing. If I create the generators inside a separate function and yield the return from that function it seems to work.

Answer

i is a free variable for those generators now, and they are now going to use its last value, i.e 3. In simple words, they know from where they are supposed to fetch the value of i but are not aware of actual value of i when they were created. So, something like this:

def make_iterator():
    for i in range(len(words[0])):
        gen = (word[i] for word in words)
        yield gen
    i = 0  # Modified the value of i 

will result in:

['A', 'T']
['A', 'T']
['A', 'T']
['A', 'T']

Generator expressions are implemented as function scope, on the other hand a list comprehension runs right away and can fetch the value of i during that iteration itself.(Well list comprehensions are implemented as function scope in Python 3, but the difference is that they are not lazy)

A fix will be to use a inner function that captures the actual value of i in each loop using a default argument value:

words = ['ACGT', 'TCGA']

def make_iterator():
    for i in range(len(words[0])):
        # default argument value is calculated at the time of
        # function creation, hence for each generator it is going
        # to be the value at the time of that particular iteration  
        def inner(i=i):
            return (word[i] for word in words)
        yield inner()

generators = list(make_iterator())

for gen in generators:
    print(list(gen))

You may also want to read:

Comments