John Smith John Smith - 10 months ago 46
HTML Question

Python Scraping Html Trouble

I am currently trying to scrape my internet providers data usage. I tried looking for an api of sorts but they don't have one. I am resorting to scraping the html whch looks like this

</tr><tr class="top-border"><td>17&nbsp;&nbsp;Monday</td><td class='text-right'><span class='mb'>2,991.69&nbsp;MB</span><span class='gb'>2.92&nbsp;GB</span></td></td><td class='text-right'><span class='mb'>1,232.04&nbsp;MB</span><span class='gb'>1.20&nbsp;GB</span></td></td><td class='text-right'><span class='mb'>4,223.73&nbsp;MB</span><span class='gb'>4.12&nbsp;GB</span></td> <td>
<div class="progress"><div class="bar bar-success" style="width: 51%;"></div></div> </td>

</tr><tr><td>18&nbsp;&nbsp;Tuesday</td><td class='text-right'><span class='mb'>3,589.42&nbsp;MB</span><span class='gb'>3.51&nbsp;GB</span></td></td><td class='text-right'><span class='mb'>1,199.58&nbsp;MB</span><span class='gb'>1.17&nbsp;GB</span></td></td><td class='text-right'><span class='mb'>4,789.00&nbsp;MB</span><span class='gb'>4.68&nbsp;GB</span></td> <td>
<div class="progress"><div class="bar bar-success" style="width: 57%;"></div></div> </td>


I tried to use pythons but I can only get a bit of info out of it.

search ="class='gb'>(.*)&nbsp;GB</span>",raw_info)
for i in range(0,100):


class='gb'>6.88&nbsp;GB</span></td></td><td class='text-right'><span class='mb'>
1,295.90&nbsp;MB</span><span class='gb'>1.27&nbsp;GB</span></td></td><td class='
text-right'><span class='mb'>8,340.12&nbsp;MB</span><span class='gb'>8.14&nbsp;G
6.88&nbsp;GB</span></td></td><td class='text-right'><span class='mb'>1,295.90&nb
sp;MB</span><span class='gb'>1.27&nbsp;GB</span></td></td><td class='text-right'
><span class='mb'>8,340.12&nbsp;MB</span><span class='gb'>8.14

I learned I can't use groups like that to print out all of the numbers

I need to print all the numbers referring to gb and print them like this



Answer Source

You might want to try using BeautifulSoup, it's a very flexible library which can do exactly what you are looking for.

html = scraped
soup = BeautifulSoup(html)
spans = soup.findAll('span', attrs={'class': 'gb'})

You will then have a list of all the span tags that have the gb class. Producing the numbers and converting them to floats then applying whatever format you want to print them in is fairly simple.