user3833794 user3833794 - 2 months ago 18
Python Question

Python read data from XML file

I use minidom to read my XML files but with the following example this is not working. I have an error message:

I would to retrieve the value in

<span>
tag (
101.86090
) but I have an error.

This is the code:

from xml.dom import minidom

docXML = minidom.parse('/root/Desktop/tpage.xml')
node = docXML.getElementsByTagName('span')[0]
t= node.firstChild.data


This is the content of
tpage.xml
:

<span class="lp">

<span sys:innerhtml="{binding Last}"

sys:codeafter="$.quotebroker.setTitleProperties($dataItem, 'Last')">


101.86090

</span>

</span>


and this is the error message:

File "minidomrecup.py", line 5, in <module>
dom = parse('/root/Desktop/bot/tpage.xml')
File "/usr/lib/python2.7/xml/dom/minidom.py", line 1920, in parse
return expatbuilder.parse(file)
File "/usr/lib/python2.7/xml/dom/expatbuilder.py", line 924, in parse
result = builder.parseFile(fp)
File "/usr/lib/python2.7/xml/dom/expatbuilder.py", line 207, in parseFile
parser.Parse(buffer, 0)
xml.parsers.expat.ExpatError: unbound prefix: line 2, column 0

Answer

The shown XML isn't valid because it uses a namespace prefix (sys) but doesn't define it and the XML parser (xml.dom.expatbuilder module) chokes on that. You would have to go straight to the expatbuilder in order to give its parse() function the argument to ignore namespaces. And if you want to extract the text node in the second <span> your index is off by one:

from xml.dom import expatbuilder


def main():
    document = expatbuilder.parse('test.xml', False)
    node = document.getElementsByTagName('span')[1]
    print float(node.firstChild.data)


if __name__ == '__main__':
    main()