cmj29607 cmj29607 - 2 months ago 14
Python Question

parse a section of an XML file with python

Im new to both python and xml. Have looked at the previous posts on the topic, and I cant figure out how to do exactly what I need to. Although it seems to be simple enough in principle.

<Project>
<Items>
<Item>
<Code>A456B</Code>
<Database>
<Data>
<Id>mountain</Id>
<Value>12000</Value>
</Data>
<Data>
<Id>UTEM</Id>
<Value>53.2</Value>
</Data>
</Database>
</Item>
<Item>
<Code>A786C</Code>
<Database>
<Data>
<Id>mountain</Id>
<Value>5000</Value>
</Data>
<Data>
<Id>UTEM</Id>
<Value></Value>
</Data>
</Database>
</Item>
</Items>
</Project>


All I want to do is extract all of the Codes, Values and ID's, which is no problem.

import xml.etree.cElementTree as ET

name = 'example tree.xml'
tree = ET.parse(name)
root = tree.getroot()
codes=[]
ids=[]
val=[]
for db in root.iter('Code'):
codes.append(db.text)
for ID in root.iter('Id'):
ids.append(ID.text)
for VALUE in root.iter('Value'):
val.append(VALUE.text)
print codes
print ids
print val

['A456B', 'A786C']
['mountain', 'UTEM', 'mountain', 'UTEM']
['12000', '53.2', '5000', None]


I want to know which Ids and Values go with which Code. Something like a dictionary of dictionaries maybe OR perhaps a list of DataFrames with the row index being the Id, and the column header being Code.

for example

A456B = {mountain:12000, UTEM:53.2}

A786C = {mountain:5000, UTEM: None}

Eventually I want to use the Values to feed an equation.

Note that the real xml file might not contain the same number of Ids and Values in each Code. Also, Id and Value might be different from one Code section to another.

Sorry if this question is elementary, or unclear...I've only been doing python for a month :/

Boa Boa
Answer

BeautifulSoup is a very useful module for parsing HTML and XML.

from bs4 import BeautifulSoup
import os

# read the file into a BeautifulSoup object
soup = BeautifulSoup(open(os.getcwd() + "\\input.txt"))

results = {}

# parse the data, and put it into a dict, where the values are dicts
for item in soup.findAll('item'):
    # assemble dicts on the fly using a dict comprehension:
    # http://stackoverflow.com/a/14507637/4400277
    results[item.code.text] = {data.id.text:data.value.text for data in item.findAll('data')}

>>> results
{u'A786C': {u'mountain': u'5000', u'UTEM': u''}, 
 u'A456B': {u'mountain': u'12000', u'UTEM': u'53.2'}