dingo_d - 10 months ago 39

Python Question

I am trying to sort lists into one list that contain numbers and names of sections, sub sections and sub sub sections. The program looks like this:

`import heapq`

sections = ['1. Section', '2. Section', '3. Section', '4. Section', '5. Section', '6. Section', '7. Section', '8. Section', '9. Section', '10. Section', '11. Section', '12. Section']

subsections = ['1.1 Subsection', '1.2 Subsection', '1.3 Subsection', '1.4 Subsection', '2.1 Subsection', '4.1 My subsection', '7.1 Subsection', '8.1 Subsection', '12.1 Subsection']

subsubsections = ['1.2.1 Subsubsection', '1.2.2 Subsubsection', '1.4.1 Subsubsection', '2.1.1 Subsubsection', '7.1.1 Subsubsection', '8.1.1 Subsubsection', '12.1.1 Subsubsection']

sorted_list = list(heapq.merge(sections, subsections, subsubsections))

print(sorted_list)

What I get out is this:

`['1. Section', '1.1 Subsection', '1.2 Subsection', '1.2.1 Subsubsection', '1.2.2 Subsubsection', '1.3 Subsection', '1.4 Subsection', '1.4.1 Subsubsection', '2. Section', '2.1 Subsection', '2.1.1 Subsubsection', '3. Section', '4. Section', '4.1 My subsection', '5. Section', '6. Section', '7. Section', '7.1 Subsection', '7.1.1 Subsubsection', '8. Section', '8.1 Subsection', '12.1 Subsection', '8.1.1 Subsubsection', '12.1.1 Subsubsection', '9. Section', '10. Section', '11. Section', '12. Section']`

My 12th subsection, and sub sub section is located within 8th section, not 12th.

Why is this happening? The original lists are sorted, and it all goes good, apparently up to number 10.

I'm not sure why this is happening and is there a way to better sort this into a 'tree' based on the numbers in the lists? I'm building a table of contents of sorts, and this will return (once I filter the list out)

`1. Section`

1.1 Subsection

1.2 Subsection

1.2.1 Subsubsection

1.2.2 Subsubsection

1.3 Subsection

1.4 Subsection

1.4.1 Subsubsection

2. Section

2.1 Subsection

2.1.1 Subsubsection

3. Section

4. Section

4.1 My subsection

5. Section

6. Section

7. Section

7.1 Subsection

7.1.1 Subsubsection

8. Section

8.1 Subsection

12.1 Subsection

8.1.1 Subsubsection

12.1.1 Subsubsection

9. Section

10. Section

11. Section

12. Section

Notice the 12.1 Subsection behind 8.1 Subsection and 12.1.1 Subsubsection behind 8.1.1 Subsubsection.

Answer Source

As explained in other answer you have to specify that sorting method, otherwise python will sort the strings lexicographically. If you are using python 3.5+ you can use `key`

argument in merge, in pyhton 3.5- you can use `itertools.chain`

and `sorted`

, and as a general approach you can use regex in order to find the numbers and convert them to int :

```
In [18]: from itertools import chain
In [19]: import re
In [23]: sorted(chain.from_iterable((sections, subsections, subsubsections)),
key = lambda x: [int(i) for i in re.findall(r'\d+', x)])
Out[23]:
['1. Section',
'1.1 Subsection',
'1.2 Subsection',
'1.2.1 Subsubsection',
'1.2.2 Subsubsection',
'1.3 Subsection',
'1.4 Subsection',
'1.4.1 Subsubsection',
'2. Section',
'2.1 Subsection',
'2.1.1 Subsubsection',
'3. Section',
'4. Section',
'4.1 My subsection',
'5. Section',
'6. Section',
'7. Section',
'7.1 Subsection',
'7.1.1 Subsubsection',
'8. Section',
'8.1 Subsection',
'8.1.1 Subsubsection',
'9. Section',
'10. Section',
'11. Section',
'12. Section',
'12.1 Subsection',
'12.1.1 Subsubsection']
```