Rachel Sarah Osolen Rachel Sarah Osolen - 3 months ago 10x
HTML Question

BeautifulSoup Scrapping Span Class HTML

I am trying to scrap from the

<span class= ''>
. The code looks like this on the pages I am scrapping:

< span class = "catnum"> Disc Number < / span>
< br >
< span class = "catnum"> Track Number < / span>
< br>
< span class = "catnum" > Duration < /span>

What I need to get are those numbers after the
tag. I should also mention I am writing a larger piece of code that is scrapping 1200 sites and this will have to loop over 1200 sites where the numbers in the quotation marks will change from page to page.

I tried this code as a test on one page:

from bs4 import BeautifulSoup

soup = BeautifulSoup (open("Smith.html"), "html.parser")

for tag in soup.findAll('span'):
if tag.has_key('class'):
if tag['class'] == 'catnum':
print tag.string

I know that will print ALL the 'span class' tags and not just the three I want, but I thought I would still test it to see if it worked and I got this error:

/Library/Python/2.7/site-packages/bs4/element.py:1527: UserWarning:
has_key is deprecated. Use has_attr("class") instead. key))


as said in the error message, you should use tag.has_attr("class") in place of the deprecated tag.has_key("class") method.

Hope it helps.