MAA MAA - 1 year ago 207
Python Question

What is this bucket sort implementation doing?

This is my code for bucket sort in Python.

from random import randrange

def insertion_sort(aList):
for i in range(1, len(aList)):
for j in range(i, 0, -1):
if aList[j] < aList[j-1]:
aList[j], aList[j-1] = aList[j-1], aList[j]
return aList

def bucket_sort(aList):
buckets = [[]] * len(aList)
for index, value in enumerate(aList):
buckets_index = value * len(aList) // (max(aList) + 1)

answer = []

for bucket in buckets:
# answer += insertion_sort(bucket)

# return answer

aList = [randrange(10) for _ in range(100)]
answer = bucket_sort(aList)

What is happening? When I run the code, I always find that the first list in buckets is already sorted and the other lists in buckets are all copies of it.
Do I need the insertion sort for each list?
What would I use the "answer" variable for?!

I'm mainly relying on this visualization.

Rob Rob
Answer Source

One thing that i notice right off the bat is that you initialize your variable buckets as buckets = [[]] * len(aList). This makes a list of identical copies of the empty list. As such, any modification of this list is replicated in every element of buckets. Change this line to:

buckets =  [[] for _ in xrange(len(aList))]

To check if the lists inside the list are separate object, you could check their id's:

print [id(x) for x in buckets]

This should print a list of unique numbers.