MAA - 2 years ago 86
Python Question

# Merge sort a list of words

I have implemented the mergesort algorithm using a post on codereview. It's working nicely on a list of integers but I thought a more practical application is need. So I downloaded a text file with random english words and attempted to sort them.

However it does absolutely nothing.

``````def merge_sort(seq):
if len(seq) == 1:
return seq
else:
# recursive step. Break the list into chunks of 1
mid_index = len(seq) // 2
left  = merge_sort( seq[:mid_index] )
right = merge_sort( seq[mid_index:] )

left_counter, right_counter, master_counter = 0, 0, 0

while left_counter < len(left) and right_counter < len(right):
if left[left_counter] < right[right_counter]:
seq[master_counter] = left[left_counter]
left_counter += 1
else:
seq[master_counter] = right[right_counter]
right_counter += 1

master_counter += 1

# Handle the remaining items in the remaining_list
# Either left or right is done already, so only one of these two
#    loops will execute

while left_counter < len(left):  # left list isn't done yet
seq[master_counter] = left[left_counter]
left_counter   += 1
master_counter += 1

while right_counter < len(right):  # right list isn't done yet
seq[master_counter] = right[right_counter]
right_counter   += 1
master_counter  += 1

return seq
``````

I think the problem is that it's handling a list of lists instead of a single list. Also the function has no way of knowing what's the basis of sorting. Is that correct?

Here is how I want to call it

``````with open('words.txt') as f:
new = merge_sort(list_of_words)
print(new == sorted(list_of_words, key=len))
``````

As you identified the problem is that `merge_sort` has no way of knowing the basis of sorting. You could change `merge_sort` to take in an additional parameter that returns the key for each element in the sequence just like `sorted` does:

``````def merge_sort(seq, key=lambda x: x):
``````

And then change the comparison to call passed function instead of comparing elements directly:

``````if key(left[left_counter]) < key(right[right_counter]):
seq[master_counter] = left[left_counter]
left_counter += 1
else:
seq[master_counter] = right[right_counter]
right_counter += 1
``````

With these changes it will work as you expect:

``````merge_sort([4, 6, 2, 1]) # [1, 2, 4, 6]
merge_sort(['foo', 'a', 'bar', 'foobar'], key=len) # ['a', 'bar', 'foo', 'foobar']
``````

One thing to note though is that results are not identical with `sorted` since `merge_sort` is not stable:

``````merge_sort(['foo', 'a', 'bar', 'foobar'], key=len) # ['a', 'bar', 'foo', 'foobar']
sorted(['foo', 'a', 'bar', 'foobar'], key=len) # ['a', 'foo', 'bar', 'foobar']
``````
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download