Matt Matt - 6 months ago 32
Python Question

Trouble parsing XML with python

I have parsed an XML file with BeautifulSoup in Python and I am having trouble extracting the data out of it. An example of the structure of the XML is below:

<Products page="0" pages="-1" records="27">
<Product id="ABC001">
<Name>This product name</Name>
<Class id="USD">
<Product id="XYZ002">
<Name>That product name</Name>
<Tag>More Text</Tag>
<Class id="EUR">

The first thing I have been trying to accomplish but have so far failed to do is to extract all of the Product and Class id's

What I have tried is

products = soup.find_all("Product")

for p in products:
print(p.find("name")) # gets the name tag
print(p.find("cur")) # gets the cur tag
# ...etc

However, I can't figure out how to access
. For example,

Note that while I am using bs4 I don't have to - it's just that I have done a lot of web scraping with Python + bs4 and have found bs4 to be useful in navigating through HTML, so assumed it would be the ideal way of handling XML.


id is an attribute of Product, not a child element, so you access it with: