significance significance - 1 year ago 69
Python Question

how do i rewrite this function to implement OrderedDict?

I have the following function which does a crude job of parsing an XML file into a dictionary.

Unfortunately, since Python dictionaries are not ordered, I am unable to cycle through the nodes as I would like.

How do i change this so it outputs an ordered dictionary which reflects the original order of the nodes when looped with 'for'.

def simplexml_load_file(file):
import collections
from lxml import etree

tree = etree.parse(file)
root = tree.getroot()

def xml_to_item(el):
item = None
if el.text:
item = el.text
child_dicts = collections.defaultdict(list)
for child in el.getchildren():
child_dicts[child.tag].append(xml_to_item(child))
return dict(child_dicts) or item

def xml_to_dict(el):
return {el.tag: xml_to_item(el)}

return xml_to_dict(root)

x = simplexml_load_file('routines/test.xml')

print x

for y in x['root']:
print y


outputs...

{'root': {
'a': ['1'],
'aa': [{'b': [{'c': ['2']}, '2']}],
'aaaa': [{'bb': ['4']}],
'aaa': ['3'],
'aaaaa': ['5']
}}

a
aa
aaaa
aaa
aaaaa


how can i implement collections.OrderedDict so that i can be sure of getting the correct order of the nodes.

xml for reference...

<root>
<a>1</a>
<aa>
<b>
<c>2</c>
</b>
<b>2</b>
</aa>
<aaa>3</aaa>
<aaaa>
<bb>4</bb>
</aaaa>
<aaaaa>5</aaaaa>
</root>

Answer Source

You could use the new OrderedDict dict subclass which was added to the standard library's collections module in version 2.7*. Actually what you need is an Ordered+defaultdict combination which doesn't exist—but it's possible to create one by subclassing OrderedDict as illustrated below:

import collections

class OrderedDefaultdict(collections.OrderedDict):
    """ A defaultdict with OrderedDict as its base class. """

    def __init__(self, default_factory=None, *args, **kwargs):
        if not (default_factory is None
                or isinstance(default_factory, collections.Callable)):
            raise TypeError('first argument must be callable or None')
        super(OrderedDefaultdict, self).__init__(*args, **kwargs)
        self.default_factory = default_factory  # called by __missing__()

    def __missing__(self, key):
        if self.default_factory is None:
            raise KeyError(key,)
        self[key] = value = self.default_factory()
        return value

    def __reduce__(self):  # optional, for pickle support
        args = (self.default_factory,) if self.default_factory else tuple()
        return self.__class__, args, None, None, self.iteritems()

    def __repr__(self):  # optional
        return '%s(%r, %r)' % (self.__class__.__name__, self.default_factory,
                               list(self.iteritems()))

def simplexml_load_file(file):
    from lxml import etree

    tree = etree.parse(file)
    root = tree.getroot()

    def xml_to_item(el):
        item = el.text or None
        child_dicts = OrderedDefaultdict(list)
        for child in el.getchildren():
            child_dicts[child.tag].append(xml_to_item(child))
        return collections.OrderedDict(child_dicts) or item

    def xml_to_dict(el):
        return {el.tag: xml_to_item(el)}

    return xml_to_dict(root)

x = simplexml_load_file('routines/test.xml')
print(x)

for y in x['root']:
    print(y)

The output produced from your test XML file looks like this:

{'root': 
    OrderedDict(
        [('a', ['1']), 
         ('aa', [OrderedDict([('b', [OrderedDict([('c', ['2'])]), '2'])])]), 
         ('aaa', ['3']), 
         ('aaaa', [OrderedDict([('bb', ['4'])])]), 
         ('aaaaa', ['5'])
        ]
    )
}

a
aa
aaa
aaaa
aaaaa

Which I think is close to what you want.

*If your version of Python doesn't have OrderedDict, which was introduced in v2.5 you may be able use Raymond Hettinger's Ordered Dictionary for Py2.4 ActiveState recipe as a base class instead.

Minor update:

Added a __reduce__() method which will allow the instances of the class to be pickled and unpickled properly. This wasn't necessary for this question, but came up in similar one.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download