dingo_d dingo_d - 2 months ago 5
Python Question

python heapq sorting list wrong?

I am trying to sort lists into one list that contain numbers and names of sections, sub sections and sub sub sections. The program looks like this:

import heapq

sections = ['1. Section', '2. Section', '3. Section', '4. Section', '5. Section', '6. Section', '7. Section', '8. Section', '9. Section', '10. Section', '11. Section', '12. Section']
subsections = ['1.1 Subsection', '1.2 Subsection', '1.3 Subsection', '1.4 Subsection', '2.1 Subsection', '4.1 My subsection', '7.1 Subsection', '8.1 Subsection', '12.1 Subsection']
subsubsections = ['1.2.1 Subsubsection', '1.2.2 Subsubsection', '1.4.1 Subsubsection', '2.1.1 Subsubsection', '7.1.1 Subsubsection', '8.1.1 Subsubsection', '12.1.1 Subsubsection']

sorted_list = list(heapq.merge(sections, subsections, subsubsections))

print(sorted_list)


What I get out is this:

['1. Section', '1.1 Subsection', '1.2 Subsection', '1.2.1 Subsubsection', '1.2.2 Subsubsection', '1.3 Subsection', '1.4 Subsection', '1.4.1 Subsubsection', '2. Section', '2.1 Subsection', '2.1.1 Subsubsection', '3. Section', '4. Section', '4.1 My subsection', '5. Section', '6. Section', '7. Section', '7.1 Subsection', '7.1.1 Subsubsection', '8. Section', '8.1 Subsection', '12.1 Subsection', '8.1.1 Subsubsection', '12.1.1 Subsubsection', '9. Section', '10. Section', '11. Section', '12. Section']


My 12th subsection, and sub sub section is located within 8th section, not 12th.

Why is this happening? The original lists are sorted, and it all goes good, apparently up to number 10.

I'm not sure why this is happening and is there a way to better sort this into a 'tree' based on the numbers in the lists? I'm building a table of contents of sorts, and this will return (once I filter the list out)

1. Section
1.1 Subsection
1.2 Subsection
1.2.1 Subsubsection
1.2.2 Subsubsection
1.3 Subsection
1.4 Subsection
1.4.1 Subsubsection
2. Section
2.1 Subsection
2.1.1 Subsubsection
3. Section
4. Section
4.1 My subsection
5. Section
6. Section
7. Section
7.1 Subsection
7.1.1 Subsubsection
8. Section
8.1 Subsection
12.1 Subsection
8.1.1 Subsubsection
12.1.1 Subsubsection
9. Section
10. Section
11. Section
12. Section


Notice the 12.1 Subsection behind 8.1 Subsection and 12.1.1 Subsubsection behind 8.1.1 Subsubsection.

Answer

As explained in other answer you have to specify that sorting method, otherwise python will sort the strings lexicographically. If you are using python 3.5+ you can use key argument in merge, in pyhton 3.5- you can use itertools.chain and sorted, and as a general approach you can use regex in order to find the numbers and convert them to int :

In [18]: from itertools import chain
In [19]: import re
In [23]: sorted(chain.from_iterable((sections, subsections, subsubsections)),
                key = lambda x: [int(i) for i in re.findall(r'\d+', x)])
Out[23]: 
['1. Section',
 '1.1 Subsection',
 '1.2 Subsection',
 '1.2.1 Subsubsection',
 '1.2.2 Subsubsection',
 '1.3 Subsection',
 '1.4 Subsection',
 '1.4.1 Subsubsection',
 '2. Section',
 '2.1 Subsection',
 '2.1.1 Subsubsection',
 '3. Section',
 '4. Section',
 '4.1 My subsection',
 '5. Section',
 '6. Section',
 '7. Section',
 '7.1 Subsection',
 '7.1.1 Subsubsection',
 '8. Section',
 '8.1 Subsection',
 '8.1.1 Subsubsection',
 '9. Section',
 '10. Section',
 '11. Section',
 '12. Section',
 '12.1 Subsection',
 '12.1.1 Subsubsection']
Comments