d.griner d.griner - 2 months ago 18
Python Question

Getting child attributes from an XML document using element tree

I have an xml pom file like the following:

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<parent>
<groupId>com.amirsys</groupId>
<artifactId>components-parent</artifactId>
<version>RELEASE</version>
</parent>
<artifactId>statdxws</artifactId>
<version>6.5.0-16</version>
<packaging>war</packaging>
<dependencies>
<dependency>
<groupId>org.postgresql</groupId>
<artifactId>postgresql</artifactId>
<version>9.4-1200-jdbc41</version>
<scope>provided</scope>
<exclusions>
<exclusion>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-simple</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>com.amirsys</groupId>
<artifactId>referencedb</artifactId>
<version>5.0.0-1</version>
<exclusions>
<exclusion>
<groupId>com.amirsys</groupId>
<artifactId>jig</artifactId>
</exclusion>
</exclusions>
</dependency>
</dependencies>




I am trying to pull the groupIds, artifactIds and versions using element tree to create a dependency object, but it won't find the dependency tags. This is my code so far:

tree = ElementTree.parse('pomFile.xml')
root = tree.getroot()
namespace = '{http://maven.apache.org/POM/4.0.0}'
for dependency in root.iter(namespace+'dependency'):
groupId = dependency.get('groupId')
artifactId = dependency.get('artifactId')
version = dependency.get('version')
print groupId, artifactId, version


This outputs nothing, and I can't figure out why the code isn't iterating through the dependency tag. Any help would be appreciated.

Answer

Your XML has a small mistake. There should be a closing tag </project> which you probably missed in the question.

The following works for me:

from xml.etree import ElementTree
tree = ElementTree.parse('pomFile.xml')
root = tree.getroot()
namespace = '{http://maven.apache.org/POM/4.0.0}'
for dependency in root.iter(namespace+'dependency'):
    groupId = dependency.find(namespace+'groupId').text
    artifactId = dependency.find(namespace+'artifactId').text
    version = dependency.find(namespace+'version').text
    print groupId, artifactId, version

$ python -i a.py
org.postgresql postgresql 9.4-1200-jdbc41
com.amirsys referencedb 5.0.0-1

Your usage of .get() is wrong. See how .get() works. Let's say your xml is:

<?xml version="1.0"?>
<data>
    <country name="Liechtenstein">
        <rank>1</rank>
        <year>2008</year>
        <gdppc>141100</gdppc>
        <neighbor name="Austria" direction="E"/>
        <neighbor name="Switzerland" direction="W"/>
    </country>
    <country name="Singapore">
        <rank>4</rank>
        <year>2011</year>
        <gdppc>59900</gdppc>
        <neighbor name="Malaysia" direction="N"/>
    </country>
    <country name="Panama">
        <rank>68</rank>
        <year>2011</year>
        <gdppc>13600</gdppc>
        <neighbor name="Costa Rica" direction="W"/>
        <neighbor name="Colombia" direction="E"/>
    </country>
</data>

And you write python code like:

import xml.etree.ElementTree as ET
tree = ET.parse('country_data.xml')
root = tree.getroot()
for country in root.findall('country'):
   rank = country.find('rank').text
   name = country.get('name')
   print rank, name

This will print:

Liechtenstein 1
Singapore 4
Panama 68

As you can see, .get() gives you the value of the attribute. The docs are pretty clear on this.