jcuenod jcuenod - 1 month ago 6
Python Question

Why is a sorted list bigger than an unsorted list

I have a list

my_list
(I don't know whether it's relevant, but the list is a filled with Unicode Hebrew strings):

>>> len(my_list)
8777
>>> from sys import getsizeof
>>> getsizeof(my_list)
77848


When I sort
my_list
though, it gets bigger:

>>> my_sorted_list = sort(my_list)
>>> len(my_sorted_list)
8777
>>> getsizeof(my_sorted_list)
79104


Why is
sort
returning a list that takes more space in memory than the initial unsorted list?

Answer

As Ignacio points out this is due to Python allocating a bit more memory than required. sorted creates a new list out of the sequence provided, sorts it in place and returns it. To create the new list Python extends an empty sized list with the one passed; that results in over-allocation (after calling list_resize).

It is worth noting that this difference is mostly present when the original list was created with a list-comp (where if, a final append doesn't trigger a resize if space is available, the size is smaller) or when literals are used (for which no appends are made and direct assigning to the underlying array is performed, keeping the size to its minimum):

l = [i for i in range(1000)]
getsizeof(l)
Out[85]: 9024

getsizeof(sorted(l))
Out[84]: 9112

When creating through list, the memory is always over-allocated; Python knows the sizes and preempts future modifications by over-allocating a bit based on the size:

l = list(range(1000))
getsizeof(l)
Out[86]: 9112

getsizeof(sorted(l))
Out[87]: 9112

It's also required to point out that this is behavior specific the C implementation of Python. Jython, IronPython and others might/might not have the same behavior.

Comments