Parth Bhoiwala - 9 months ago 40
Python Question

# Why sorting in the end is faster than inserting in sorting order?

I am generating 100 random integers and I want to store them in a sorted array. The first approach that I tried was using a binary search to find the proper index to store each number at and then insert the number at that index. This way, after 100th random number, I will have a sorted array. Binary search has a time complexity of

`log(N)`
and
`insert()`
method has a time complexity of
`N`
so the final Big-O should be
`O(Nlog(N))`
right?

Below is the code for this approach:

``````def binary_search(start, end, item):
mid = (start + end)/2
if item > mlist[end]:
return end+1
elif item > mlist[mid]:
return binary_search(mid+1, end, item)
elif item < mlist[mid]:
return binary_search(start, mid-1, item)
else:
return int(math.ceil(mid))

begin = time.time()
for i in range(100):
rand = randint(0,100)
index = binary_search(0,len(mlist)-1,rand)
mlist.insert(index,rand)
elapsed = time.time()
print((elapsed-begin)*(10**4))
``````

When I printed the difference between elapsed and begin time, I got 4.2414 microseconds.

My second approach was to simply add all random numbers in an array/list and then use the built in
`sort()`
method to sort it. The time complexity for
`sort()`
method is
`Nlog(N)`
.

``````begin = time.time()
mlist=[]
for i in range(100):
rand = randint(0,100)
mlist.append(rand)
mlist = sorted(mlist)
elapsed = time.time()
print((elapsed-begin)*(10**4))
``````

The elapsed time for this approach was 1.9407 microseconds.

I don't understand that if the time complexities for both methods are same then what makes the second approach so much faster?

Your binary search insertion is O(N^2); each insertion has to move up to O(N) elements up one step to the right, and you do this N times. However, even if it was O(NlogN), the constant cost of the sorting code is far lower than your Python code could match.

If you do want to stick to bisect insertion sorting, rather than re-invent the (admittedly simple) `bisect` wheel, do use the `bisect` module. This module comes with a C-optimised implementation.

It even has `bisect.insort*()` functions, which note:

Keep in mind that the O(log n) search is dominated by the slow O(n) insertion step.

Another tip: don't use wall-clock time to measure algorithms. Use the `timeit` module, which disables the garbage collector, uses the most accurate clock available, and runs your test multiple times to eliminate external factors.

Next, don't include creating the random values, you don't want to time how fast those can be produced; produce one list, up front, and re-use it for all timings.

Next, use a proper `bisect()` function, yours is broken for any `len(mlist) < 2`; there is no need to test for `mlist[end]` for example. The following avoids an off-by-one error and uses `mlist` as a parameter rather than a global:

``````def binary_search(mlist, item, start=0, end=None):
if end is None:
end = len(mlist)
if start >= end:
return start
mid = (start + end) // 2
if item >= mlist[mid]:
return binary_search(mlist, item, mid + 1, end)
else:
return binary_search(mlist, item, start, mid)
``````

Use `binary_search(mlist, value)` to call it, the start and end values are filled in for you.

Now you can conduct a proper timed test:

``````>>> import random, timeit
>>> values = [random.randint(0, 100) for _ in range(100)]
>>> def binsort(l):
...     mlist = []
...     for i in l:
...         index = binary_search(0, len(mlist), i, mlist)
...         mlist.insert(index, i)
...     return mlist
...
>>> count, time = timeit.Timer('binsort(values)', 'from __main__ import values, binsort').autorange()
>>> format(time / count, '.15f')
'0.000146628010299'
>>> count, time = timeit.Timer('sorted(values)', 'from __main__ import values').autorange()
>>> format(time / count, '.15f')
'0.000008379445840'
>>> values = [random.randint(0, 100) for _ in range(1000)]
>>> count, time = timeit.Timer('binsort(values)', 'from __main__ import values, binsort').autorange()
>>> format(time / count, '.15f')
'0.002460538140149'
>>> count, time = timeit.Timer('sorted(values)', 'from __main__ import values').autorange()
>>> format(time / count, '.15f')
'0.000144566002200'
``````
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download