transilvlad transilvlad - 2 years ago 184
Python Question

xml.etree.ElementTree get node depth

The XML:

<?xml version="1.0"?>
<pages>
<page>
<url>http://example.com/Labs</url>
<title>Labs</title>
<subpages>
<page>
<url>http://example.com/Labs/Email</url>
<title>Email</title>
<subpages>
<page>
<url>http://example.com/Labs/Email/How_to</url>
<title>How-To</title>
<subpages/>
</page>
<page>
<url>http://example.com/Labs/Social</url>
<title>Social</title>
</page>
<subpages/>
</page>
<page>
<url>http://example.com/Tests</url>
<title>Tests</title>
<subpages>
<page>
<url>http://example.com/Tests/Email</url>
<title>Email</title>
<subpages>
<page>
<url>http://example.com/Tests/Email/How_to</url>
<title>How-To</title>
<subpages/>
</page>
<page>
<url>http://example.com/Tests/Social</url>
<title>Social</title>
</page>
<subpages/>
</page>
</pages>


The code:

// rexml is the XML string read from a URL
from xml.etree import ElementTree as ET
tree = ET.fromstring(rexml)
for node in tree.iter('page'):
for url in node.iterfind('url'):
print url.text
for title in node.iterfind('title'):
print title.text.encode("utf-8")
print '-' * 30


The output:

http://example.com/article1
Article1
------------------------------
http://example.com/article1/subarticle1
SubArticle1
------------------------------
http://example.com/article2
Article2
------------------------------
http://example.com/article3
Article3
------------------------------


The Xml represents a tree like structure of a sitemap.

I have been up and down the docs and Google all day and can't figure it out hot to get the node depth of entries.

I used counting of the children container but that only works for the first parent and then it breaks as I can't figure it out how to reset. But this is probably just a hackish idea.

The desired output:

0
http://example.com/article1
Article1
------------------------------
1
http://example.com/article1/subarticle1
SubArticle1
------------------------------
0
http://example.com/article2
Article2
------------------------------
0
http://example.com/article3
Article3
------------------------------

Answer Source

Used lxml.html.

import lxml.html

rexml = ...

def depth(node):
    d = 0
    while node is not None:
        d += 1
        node = node.getparent()
    return d

tree = lxml.html.fromstring(rexml)
for node in tree.iter('page'):
    print depth(node)
    for url in node.iterfind('url'):
        print url.text
    for title in node.iterfind('title'):
        print title.text.encode("utf-8")
    print '-' * 30
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download