John Smith John Smith - 1 month ago 5
HTML Question

Python Scraping Html Trouble

I am currently trying to scrape my internet providers data usage. I tried looking for an api of sorts but they don't have one. I am resorting to scraping the html whch looks like this

</tr><tr class="top-border"><td>17&nbsp;&nbsp;Monday</td><td class='text-right'><span class='mb'>2,991.69&nbsp;MB</span><span class='gb'>2.92&nbsp;GB</span></td></td><td class='text-right'><span class='mb'>1,232.04&nbsp;MB</span><span class='gb'>1.20&nbsp;GB</span></td></td><td class='text-right'><span class='mb'>4,223.73&nbsp;MB</span><span class='gb'>4.12&nbsp;GB</span></td> <td>
<div class="progress"><div class="bar bar-success" style="width: 51%;"></div></div> </td>

</tr><tr><td>18&nbsp;&nbsp;Tuesday</td><td class='text-right'><span class='mb'>3,589.42&nbsp;MB</span><span class='gb'>3.51&nbsp;GB</span></td></td><td class='text-right'><span class='mb'>1,199.58&nbsp;MB</span><span class='gb'>1.17&nbsp;GB</span></td></td><td class='text-right'><span class='mb'>4,789.00&nbsp;MB</span><span class='gb'>4.68&nbsp;GB</span></td> <td>
<div class="progress"><div class="bar bar-success" style="width: 57%;"></div></div> </td>


ect

I tried to use pythons re.search but I can only get a bit of info out of it.
eg:

search = re.search("class='gb'>(.*)&nbsp;GB</span>",raw_info)
for i in range(0,100):
try:
print(search.group(i))
except:
break


output:

class='gb'>6.88&nbsp;GB</span></td></td><td class='text-right'><span class='mb'>
1,295.90&nbsp;MB</span><span class='gb'>1.27&nbsp;GB</span></td></td><td class='
text-right'><span class='mb'>8,340.12&nbsp;MB</span><span class='gb'>8.14&nbsp;G
B</span>
6.88&nbsp;GB</span></td></td><td class='text-right'><span class='mb'>1,295.90&nb
sp;MB</span><span class='gb'>1.27&nbsp;GB</span></td></td><td class='text-right'
><span class='mb'>8,340.12&nbsp;MB</span><span class='gb'>8.14


I learned I can't use groups like that to print out all of the numbers

tldr:
I need to print all the numbers referring to gb and print them like this


2.92,1.20,4.12

3.51,1.17,4.68

Answer

You might want to try using BeautifulSoup, it's a very flexible library which can do exactly what you are looking for.

html = scraped
soup = BeautifulSoup(html)
spans = soup.findAll('span', attrs={'class': 'gb'})

You will then have a list of all the span tags that have the gb class. Producing the numbers and converting them to floats then applying whatever format you want to print them in is fairly simple.

Comments