Matt Matt - 1 year ago 84
Python Question

Trouble parsing XML with python

I have parsed an XML file with BeautifulSoup in Python and I am having trouble extracting the data out of it. An example of the structure of the XML is below:

<Products page="0" pages="-1" records="27">
<Product id="ABC001">
<Name>This product name</Name>
<Class id="USD">
<Product id="XYZ002">
<Name>That product name</Name>
<Tag>More Text</Tag>
<Class id="EUR">

The first thing I have been trying to accomplish but have so far failed to do is to extract all of the Product and Class id's

What I have tried is

products = soup.find_all("Product")

for p in products:
print(p.find("name")) # gets the name tag
print(p.find("cur")) # gets the cur tag
# ...etc

However, I can't figure out how to access
. For example,

Note that while I am using bs4 I don't have to - it's just that I have done a lot of web scraping with Python + bs4 and have found bs4 to be useful in navigating through HTML, so assumed it would be the ideal way of handling XML.

Answer Source

id is an attribute of Product, not a child element, so you access it with:

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download