mohit mohit - 2 months ago 16
Python Question

why does python takes longer time to sort a copy of list?

Why is there so much difference when sorting x and y and y is just a copy of x? Does python not copy the list right away?

python -mtimeit -s'import random; x=range(1000); random.shuffle(x)' 'y=list(x); x.sort()'
100000 loops, best of 3: 19.5 usec per loop
python -mtimeit -s'import random; x=range(1000); random.shuffle(x)' 'y=list(x); y.sort()'
1000 loops, best of 3: 211 usec per loop
python -mtimeit -s'import random; x=range(1000); random.shuffle(x)' 'x.sort()'
100000 loops, best of 3: 15.9 usec per loop

Answer

Your first and last examples have to sort the list just once. After that, the sorting algorithm Python uses has an easy time of it as it is optimised to take advantage of already sorted sequences.

From the Wikipedia article on TimSort (the Python sorting algorithm):

The algorithm finds subsequences of the data that are already ordered, and uses that knowledge to sort the remainder more efficiently.

In other words, the list(y); y.sort() case is disadvantaged because it is given a clean, unsorted list each time. The x.sort() case is given a fully sorted list (after the first iteration). After all, list.sort() sorts in place, altering the list. The timeit test does not create a copy for each test, so subsequent tests re-use the same list.

If you switch to using the sorted() function, you'll see no time advantage anymore, as sorted() returns a new, copied and sorted list instead of sorting in-place:

python -mtimeit -s'import random; x=range(1000); random.shuffle(x)' 'y=list(x); sorted(x)'
10000 loops, best of 3: 151 usec per loop
python -mtimeit -s'import random; x=range(1000); random.shuffle(x)' 'y=list(x); sorted(y)'
10000 loops, best of 3: 155 usec per loop
python -mtimeit -s'import random; x=range(1000); random.shuffle(x)' 'sorted(x)'
10000 loops, best of 3: 152 usec per loop