tlg - 1 year ago 68
Python Question

# Speed to iterate multiple times over a generator compared to a list

I expected that in the case of multiple loops list iteration will be much faster than using a generator, and my code suggests this is false.

My understanding is (by operation I mean any expression defining an element):

• a list requires n operations to be initialized

• but then every loop over the list is just grabbing an element from memory

• thus, m loops over a list require only n operations

• a generator does not require any operations to be initialized

• however, looping over generator runs operations in fly

• thus, one loop over a generator requires n operations

• but m loops over a generator require n x m operations

And I checked my expectations using the following code:

``````from timeit import timeit

def pow2_list(n):
"""Return a list with powers of 2"""

results = []

for i in range(n):
results.append(2**i)

return results

def pow2_gen(n):
"""Generator of powers of 2"""

for i in range(n):
yield 2**i

def loop(iterator, n=1000):
"""Loop n times over iterable object"""

for _ in range(n):
for _ in iterator:
pass

l = pow2_list(1000) # point to a list
g = pow2_gen(1000)  # point to a generator

time_list = \
timeit("loop(l)", setup="from __main__ import loop, l", number=10)

time_gen = \
timeit("loop(g)", setup="from __main__ import loop, g", number=10)

print("Loops over list took: ", time_list)
print("Loops over generator took: ", time_gen)
``````

And the results surprised me...

``````Loops over list took:  0.20484769299946493
Loops over generator took:  0.0019217690005461918
``````

Somehow using generators appears much faster than lists, even when looping over 1000 times. And in this case we are talking about two orders of magnitude! Why?

EDIT:

Thanks for answers. Now I see my mistake. I wrongly assumed that generator starts from beginning on a new loop, like range:

``````>>> x = range(10)
>>> sum(x)
45
>>> sum(x)
45
``````

But this was naive (range is not a generator...).

Regarding possible duplicate comment: my problem concerned multiple loops over generator, which is not explained in the other thread.

Your generator is actually only looping once. Once created with `pow2_gen`, `g` stores a generator; the very first time through `loop`, this generator is consumed, and emits `StopIteration`. The other times through `loop`, `g.next()` just continues to throw `StopIteration`, so, in effect `g` represents an empty sequence.

To make the comparison more fair, you would need to re-create the generator every time you loop.

A further difficulty with the way you've approached this is that you're calling `append` to build your list, which is likely the very slowest way to construct a list. More often, lists are built with list comprehensions.

The following code lets us pick apart the timing a bit more carefully. `create_list` and `create_gen` create lists and generators, respectively, using list comprehension and generator expressions. `time_loop` is like your `loop` method, while `time_apply` is a version of `loop` that re-creates the iterable each time through the loop.

``````def create_list(n=1000):
return [2**i for i in range(n)]

def create_gen(n=1000):
return (2**i for i in range(n))

def time_loop(iterator, n=1000):
for t in range(n):
for v in iterator:
pass

def time_apply(create_fn, fn_arg, n=1000):
for t in range(n):
iterator = create_fn(fn_arg)
time_loop(iterator, 1)

print('time_loop(create_list): %.3f' % timeit("time_loop(create_list(1000))",
setup="from __main__ import *",
number=10))

print('time_loop(create_gen): %.3f' % timeit("time_loop(create_gen(1000))",
setup="from __main__ import *",
number=10))

print('time_apply(create_list): %.3f' % timeit("time_apply(create_list, 1000)",
setup="from __main__ import *",
number=10))

print('time_apply(create_gen): %.3f' % timeit("time_apply(create_gen, 1000)",
setup="from __main__ import *",
number=10))
``````

Results on my box suggest that building a list (`time_apply(create_list)`) is similar in time to (or maybe even faster than) building a generator (`time_apply(create_gen)`).

``````time_loop(create_list): 0.244
time_loop(create_gen): 0.028
time_apply(create_list): 21.190
time_apply(create_gen): 21.555
``````

You can see the same effect you've documented in your question, which is that `time_loop(create_gen)` is an order of magnitude faster than `time_loop(create_list)`. Again, this is because the generator created is only being iterated over once, rather than the many loops over the list.

As you hypothesise, building a list once and iterating over it many times (`time_loop(create_list)`) is faster than iterating over a generator many times (`time_apply(create_gen)`) in this particular scenario.

The trade-off between list and generator is going to be strongly dependent on how big the iterator you're creating is. With 1000 items, I would expect lists to be pretty fast. With 100,000 items, things might look different.

``````print('create big list: %.3f' % timeit("l = create_list(100000)",
setup="from __main__ import *",
number=10))

print('create big gen: %.3f' % timeit("g = create_gen(100000)",
setup="from __main__ import *",
number=10))
``````

Here I'm getting:

``````create big list: 209.748
create big gen: 0.023
``````

Python uses up between 700 and 800 MB of memory building the big list; the generator uses almost nothing at all. Memory allocation and garbage cleanup are computationally expensive in Python, and predictably make your code slow; generators are a very simple way to avoid gobbling up your machine's RAM, and can make a big difference to runtime.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download