Greg Perry Greg Perry - 1 year ago 48
Python Question

Python BeautifulSoup Referencing Tag with space in XML tag

Apologies for my poorly formatted post and written code, first post! I'm sure this is a simple fix but cannot seem to figure it out.

Question 1: I'm Writing an XML scraper for an Eve Online API. I have a need to iterate on an HTML tag that has a space in it (

type id="X"
) with
. I would like to iterate over the item ID tags in the XML (2 in this example). I'm not sure if
is the right approach or not. I know print will find the first buy tag in the XML and print the children, but cannot find the same method to have it print the first Data.
'type id="x"'

Question 2: Any direction on how one would continue the scrape process would be welcome. I was thinking about exporting the buy / sell / all to some sort of storage (was thinking CSV files but not positive) with different storages for the different item IDs and the buy / sell orders.

import requests #Used to service API connection
from lxml import html #Used to parse XML
from bs4 import BeautifulSoup #Used to read XML table on webpage

ItemTypeID1 = 34
ItemTypeID2 = 35
RegionID = 10000002

Webpage = requests.get('' % (ItemTypeID1, ItemTypeID2, RegionID))
#Check if page is up
if Webpage.status_code == 200:
#Convert webpage to %Data
Data = BeautifulSoup(Webpage.text, 'lxml')

#Problem line
for item in Data.iter('type id='):
print 'something'

<?xml version='1.0' encoding='utf-8'?>
<evec_api version="2.0" method="marketstat_xml">
<marketstat><type id="34">
</type><type id="35">

Answer Source

type id= is not a tag. The tag name of the element is type and id is an attribute of that element.

for item in Data.find_all('type'):
    print item.get('id')

For the URL that you reference this code will output:


The code simply finds all elements with tag name "type" and displays the id attribute of each tag found.

You can access the data contained in the nested buy and sell tags:

for item in Data.find_all('type'):
    print item.get('id')
    volume =
    avg =
    # etc.

which shows how to get at the data contained in the volume and avg tags for each item.

There is also a JSON API available which might be easier to use, especially when using the requests module:

import requests

url = ''    # the JSON endpoint
params = {'typeid': (34, 35), 'RegionID': 10000002}
r = requests.get(url, params=params)
data = r.json()

This gives you a list of Python dictionaries to work with:

for type_ in data:
    print '{}: volume = {}, avg = {}'.format(type_['buy']['forQuery']['types'][0], type_['buy']['volume'], type_['buy']['avg'])
34: volume = 110242267166, avg = 4.29419161677
35: volume = 40908217125, avg = 6.71507628294

although getting the type id back out of the JSON response is a bit awkward compared to XML.