jaydh jaydh - 1 year ago 59
Python Question

Python - Sorting in ascending order in a txt file

I had a huge document that I parsed using regex to give a txt file (json.dump) similar to the following:

"stuff": [
"name": [
"number": 11300,
"identifier": "Tsdsad"
"name": [
"number": 117900,
"identifier": "Pdfms"
name: [
"number": 660,
"identifier": "Unnamed"

Now I would like to sort this document in ascending order based on the number. (i.e. "Pdfms" first, "Tsdsad" second, "Unnamed" third). I am unsure how to start this off in python, could anyone give me a point in the right direction? Thanks in advance

Answer Source

First problem: That's not legitimate JSON. You have extra commas (JSON doesn't like [a,b,c,]; it insists on [a,b,c]) in the source, and you have some identifiers (the third instance of name, e.g.) that are not quoted. Ideally, you will improve your initial text file parsing and JSONification to fix those issues. Or you can handle those fixups on the fly, like this:

json_source = """
    ... your text data from above ...

import re
BADCOMMA = re.compile(r',\s+\]')
json_source = BADCOMMA.sub(']', json_source)

BADIDENTIFIER = re.compile(r'\s+name:\s*')
json_source = BADIDENTIFIER.sub('"name":', json_source)

Beware, assuming you can fix every possible problem on the fly is a fragile pattern. Editing structured data files via regular expressions, likewise. Better to generate good JSON from the get-go.

Now, how to sort:

import json
data = json.loads(json_source)

data['stuff'].sort(key=lambda item: item['number'], reverse=True)

That does an in-place sort of the "stuff" array, by the "number" value, and reverses it (because your example of how you want the output suggests a descending rather than the typical ascending sort).

To demonstrate that the sort has done what you want, the pprint module can be handy:

from pprint import pprint


{u'stuff': [{u'identifier': u'Pdfms',
             u'name': [u'Fast', u'Guard', u'Named'],
             u'number': 117900},
            {u'identifier': u'Tsdsad',
             u'name': [u'frfer', u'niddsi'],
             u'number': 11300},
            {u'identifier': u'Unnamed', u'name': [u'Fast'], u'number': 660}]}