mohit - 8 months ago 41

Python Question

Why is there so much difference when sorting x and y and y is just a copy of x? Does python not copy the list right away?

`python -mtimeit -s'import random; x=range(1000); random.shuffle(x)' 'y=list(x); x.sort()'`

100000 loops, best of 3: 19.5 usec per loop

python -mtimeit -s'import random; x=range(1000); random.shuffle(x)' 'y=list(x); y.sort()'

1000 loops, best of 3: 211 usec per loop

python -mtimeit -s'import random; x=range(1000); random.shuffle(x)' 'x.sort()'

100000 loops, best of 3: 15.9 usec per loop

Answer

Your first and last examples have to sort the list **just once**. After that, the sorting algorithm Python uses has an easy time of it as it is optimised to take advantage of *already sorted sequences*.

From the Wikipedia article on TimSort (the Python sorting algorithm):

The algorithm finds subsequences of the data that are already ordered, and uses that knowledge to sort the remainder more efficiently.

In other words, the `list(y); y.sort()`

case is *disadvantaged* because it is given a clean, unsorted list each time. The `x.sort()`

case is given a fully sorted list (after the first iteration). After all, `list.sort()`

sorts *in place*, altering the list. The `timeit`

test does not create a copy for each test, so subsequent tests re-use the same list.

If you switch to using the `sorted()`

function, you'll see no time advantage anymore, as `sorted()`

returns a *new, copied and sorted list* instead of sorting in-place:

```
python -mtimeit -s'import random; x=range(1000); random.shuffle(x)' 'y=list(x); sorted(x)'
10000 loops, best of 3: 151 usec per loop
python -mtimeit -s'import random; x=range(1000); random.shuffle(x)' 'y=list(x); sorted(y)'
10000 loops, best of 3: 155 usec per loop
python -mtimeit -s'import random; x=range(1000); random.shuffle(x)' 'sorted(x)'
10000 loops, best of 3: 152 usec per loop
```