significance significance - 3 months ago 8
Python Question

how do i rewrite this function to implement OrderedDict?

I have the following function which does a crude job of parsing an XML file into a dictionary.

Unfortunately, since Python dictionaries are not ordered, I am unable to cycle through the nodes as I would like.

How do i change this so it outputs an ordered dictionary which reflects the original order of the nodes when looped with 'for'.

def simplexml_load_file(file):
import collections
from lxml import etree

tree = etree.parse(file)
root = tree.getroot()

def xml_to_item(el):
item = None
if el.text:
item = el.text
child_dicts = collections.defaultdict(list)
for child in el.getchildren():
child_dicts[child.tag].append(xml_to_item(child))
return dict(child_dicts) or item

def xml_to_dict(el):
return {el.tag: xml_to_item(el)}

return xml_to_dict(root)

x = simplexml_load_file('routines/test.xml')

print x

for y in x['root']:
print y


outputs...

{'root': {
'a': ['1'],
'aa': [{'b': [{'c': ['2']}, '2']}],
'aaaa': [{'bb': ['4']}],
'aaa': ['3'],
'aaaaa': ['5']
}}

a
aa
aaaa
aaa
aaaaa


how can i implement collections.OrderedDict so that i can be sure of getting the correct order of the nodes.

xml for reference...

<root>
<a>1</a>
<aa>
<b>
<c>2</c>
</b>
<b>2</b>
</aa>
<aaa>3</aaa>
<aaaa>
<bb>4</bb>
</aaaa>
<aaaaa>5</aaaaa>
</root>

Answer

You could use the new OrderedDict dict subclass which was added to the standard library's collections module in version 2.7*. Actually what you need is an Ordered+defaultdict combination which doesn't exist—but it's possible to create one by subclassing OrderedDict as illustrated below:

import collections

class OrderedDefaultdict(collections.OrderedDict):
    """ A defaultdict with OrderedDict as its base class. """

    def __init__(self, default_factory=None, *args, **kwargs):
        if not (default_factory is None
                or isinstance(default_factory, collections.Callable)):
            raise TypeError('first argument must be callable or None')
        super(OrderedDefaultdict, self).__init__(*args, **kwargs)
        self.default_factory = default_factory  # called by __missing__()

    def __missing__(self, key):
        if self.default_factory is None:
            raise KeyError(key,)
        self[key] = value = self.default_factory()
        return value

    def __reduce__(self):  # optional, for pickle support
        args = (self.default_factory,) if self.default_factory else tuple()
        return self.__class__, args, None, None, self.iteritems()

    def __repr__(self):  # optional
        return '%s(%r, %r)' % (self.__class__.__name__, self.default_factory,
                               list(self.iteritems()))

def simplexml_load_file(file):
    from lxml import etree

    tree = etree.parse(file)
    root = tree.getroot()

    def xml_to_item(el):
        item = el.text or None
        child_dicts = OrderedDefaultdict(list)
        for child in el.getchildren():
            child_dicts[child.tag].append(xml_to_item(child))
        return collections.OrderedDict(child_dicts) or item

    def xml_to_dict(el):
        return {el.tag: xml_to_item(el)}

    return xml_to_dict(root)

x = simplexml_load_file('routines/test.xml')
print(x)

for y in x['root']:
    print(y)

The output produced from your test XML file looks like this:

{'root': 
    OrderedDict(
        [('a', ['1']), 
         ('aa', [OrderedDict([('b', [OrderedDict([('c', ['2'])]), '2'])])]), 
         ('aaa', ['3']), 
         ('aaaa', [OrderedDict([('bb', ['4'])])]), 
         ('aaaaa', ['5'])
        ]
    )
}

a
aa
aaa
aaaa
aaaaa

Which I think is close to what you want.

*If your version of Python doesn't have OrderedDict, which was introduced in v2.5 you may be able use Raymond Hettinger's Ordered Dictionary for Py2.4 ActiveState recipe as a base class instead.

Minor update:

Added a __reduce__() method which will allow the instances of the class to be pickled and unpickled properly. This wasn't necessary for this question, but came up in similar one.