godaygo godaygo - 2 months ago 13
Python Question

Generator expression evaluation with several ... for ... in ... parts

Question: What does Python do under the hood when it sees this kind of expression?

sum(sum(i) for j in arr for i in j)





My thoughts: The above expression works. But as it is written in Python's docs:


generator expressions are implemented using a function scope


Not to be verbose :) I have an array with the following layout (as an example):

>>> arr = [
[[1,2,3], [4,5,6]],
[[7,8,9],[10,11,12]]
]


I try to sum all elements of
arr
with the following expression:

>>> sum(sum(i) for i in j for j in arr)
NameError: name 'j' is not defined


It raises
NameError
, but why not
UnboundLocalError: local variable 'j' referenced before assignment
if it is implemented using a function scope? Further I try to "unroll" this expression with the following and it looked for me as somewhat equivalent (please correct me if I'm wrong):

>>> gen = (j for j in arr)
>>> sum(sum(i) for i in gen)
TypeError: unsupported operand type(s) for +: 'int' and 'list'


But it raises
TypeError
and I can't catch the idea why. Any links to what and where to read on the subject would be very useful for me.




EDIT:

I catch the idea. Thanks @vaultah for some insight. In this case
j
is the argument that is send to generator expression:

>>> sum(sum(i) for i in j for j in arr) # NameError


that's why I get this weird
NameError
.




And @Eric answer shows that:

>>> sum(sum(i) for j in arr for i in j)


is equivalent to:

>>> def __gen(arr):
for j in arr:
for i in j:
yield sum(i)

>>> sum(__gen(arr))




Answer

Whether it is a generator or a list comprehension, the comprehension nesting is the same. It is easier to see what is going on with a list comprehension.

Given:

>>> arr
[[[1, 2, 3], [4, 5, 6]], [[7, 8, 9], [10, 11, 12]]]

You can flatten the List of Lists on Ints by 1 level using a nested list comprehension (or generator):

>>> [e for sl in arr for e in sl]
[[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12]]

You can flatten completely, given that structure, by nesting again (example only; there are better ways to flatten a deeply nested list):

>>> [e2 for sl2 in [e for sl in arr for e in sl] for e2 in sl2]
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]

Since sum takes an iterable, the second flattening is not necessary in your example:

>>> [sum(e) for sl in arr for e in sl]
[6, 15, 24, 33]   # sum of those is 78...

The order of the elements in a nested list comprehension (or generator) just is the syntax of nesting; the inner element is the element with higher priority.

To unroll the list comprehension into nested loops, the inner section becomes the higher priority outer loop:

for sl in arr:
    for sl2 in sl:
        for e in sl2:
           # now you have each int in the LoLoInts...
           # you could use yield e for a generator here

Your final question: Why do you get a TypeError with gen = (j for j in arr)?

That generator expression does nothing. Example:

>>> [j for j in arr]
[[[1, 2, 3], [4, 5, 6]], [[7, 8, 9], [10, 11, 12]]]
>>> [j for j in arr] == arr
True

So the expression [x for x in arr] just returns arr.

And sum does not know how to add arr either:

>>> sum(arr)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unsupported operand type(s) for +: 'int' and 'list'

Since gen in your example is returning the same data structure, that is your error.

To fix it:

>>> gen=(e for sl in arr for e in sl)
>>> sum(sum(li) for li in gen)
78