kloop kloop - 1 year ago 151
Python Question

how to recursively iterate over XML tags in Python using ElementTree?

I am trying to iterate over all nodes in a tree using ElementTree.

I do something like:

tree = ET.parse("/tmp/test.xml")

root = tree.getroot()

for child in root:
### do something with child

The problem is that child is an Element object and not ElementTree object, so I can't further look into it and recurse to iterate over its elements. Is there a way to iterate differently over "root" so that it iterates over the top level nodes in the tree (immediate children) and return the same class as root itself?

Answer Source

To iterate over all nodes, use the iter method on the ElementTree, not the root Element.

The root is an Element, just like the other elements in the tree and only really has context of its own attributes and children. The ElementTree has the context for all Elements.

For example, given this xml

<?xml version="1.0"?>
    <country name="Liechtenstein">
        <neighbor name="Austria" direction="E"/>
        <neighbor name="Switzerland" direction="W"/>
    <country name="Singapore">
        <neighbor name="Malaysia" direction="N"/>
    <country name="Panama">
        <neighbor name="Costa Rica" direction="W"/>
        <neighbor name="Colombia" direction="E"/>

You can do the following

>>> import xml.etree.ElementTree as ET
>>> tree = ET.parse('test.xml')
>>> for elem in tree.iter():
...     print elem
<Element 'data' at 0x10b2d7b50>
<Element 'country' at 0x10b2d7b90>
<Element 'rank' at 0x10b2d7bd0>
<Element 'year' at 0x10b2d7c50>
<Element 'gdppc' at 0x10b2d7d10>
<Element 'neighbor' at 0x10b2d7e90>
<Element 'neighbor' at 0x10b2d7ed0>
<Element 'country' at 0x10b2d7f10>
<Element 'rank' at 0x10b2d7f50>
<Element 'year' at 0x10b2d7f90>
<Element 'gdppc' at 0x10b2d7fd0>
<Element 'neighbor' at 0x10b2db050>
<Element 'country' at 0x10b2db090>
<Element 'rank' at 0x10b2db0d0>
<Element 'year' at 0x10b2db110>
<Element 'gdppc' at 0x10b2db150>
<Element 'neighbor' at 0x10b2db190>
<Element 'neighbor' at 0x10b2db1d0>