EbraHim - 2 years ago 75
Python Question

# Why does creating a generator using `()` need a lot of memory?

Let assume that I want to find

`n**2`
for all numbers smaller than
`20000000`
.

I have three options to calculate these numbers:

• Creating a list of squares:

``````import time
import psutil
import gc

gc.collect()

mem_before = psutil.virtual_memory()[3]
time1 = time.time()
x = [i**2 for i in range(20000000)]
time2 = time.time()
mem_after =  psutil.virtual_memory()[3]

print "Used Mem = ", (mem_after - mem_before)/(1024**2) #Byte to Megabyte convert.
print "Calculation time = ", time2 - time1
``````

It is really slow and time consuming:

``````Used Mem =  1270 #Megabytes
Calculation time =  33.9309999943 #Seconds
``````

• Creating a generator using
`()`

``````import time
import psutil
import gc

gc.collect()

mem_before = psutil.virtual_memory()[3]
time1 = time.time()
x = (i**2 for i in range(20000000))
time2 = time.time()
mem_after =  psutil.virtual_memory()[3]

print "Used Mem = ", (mem_after - mem_before)/(1024**2)
print "Calculation time = ", time2 - time1
``````

It is much more faster than option1, but still uses a lot of memory:

``````Used Mem =  611
Calculation time =  0.278000116348
``````

• Defining a function that returns a generator (most efficient):

``````import time
import psutil
import gc

gc.collect()

mem_before = psutil.virtual_memory()[3]
time1 = time.time()
def f(n):
i = -1
while(i<n):
i += 1
yield i**2
x = f(20000000)
time2 = time.time()
mem_after =  psutil.virtual_memory()[3]

print "Used Mem = ", (mem_after - mem_before)/(1024**2)
print "Calculation time = ", time2 - time1
``````

Its consumption:

``````Used Mem =  0
Calculation time =  0.0
``````

The questions are:

1. What's the difference between the first and second solutions? Using
`()`
creates a generator, so why does it need a lot of memory?

2. Is there any built-in function equivalent to my third option?

1. As others have pointed out in the comments, `range` creates a `list` in Python 2. Hence, it is not the generator per se that uses up the memory, but the `range` that the generator uses:

``````x = (i**2 for i in range(20000000))
# builds a 2*10**7 element list, not for the squares , but for the bases

>>> sys.getsizeof(range(100))
872
>>> sys.getsizeof(xrange(100))
40
>>> sys.getsizeof(range(1000))
8720
>>> sys.getsizeof(xrange(1000))
40
>>> sys.getsizeof(range(20000000))
160000072
>>> sys.getsizeof(xrange(20000000))
40
``````

This also explains why your second version (the generator expression) uses around half the memory of the first version (the list comprehension) as the first one builds two lists (for the bases and the squares) while the second only builds one list for the bases.

2. `xrange(20000000)` thus, greatly improves memory usage as it returns a lazy iterable. This is essentially the built-in memory efficient way to iterate over a range of numbers that mirrors your third version (with the added flexibility of `start`, `stop` and `step`):

``````x = (i**2 for i in xrange(20000000))
``````

In Python 3, `range` is essentially what `xrange` used to be in Python 2. However, the Python 3 `range` object has some nice features that Python 2's `xrange` doesn't have, like `O(1)` slicing, contains, etc.

## Some references:

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download