Ollie Ollie - 1 year ago 102
Python Question

£ displaying in urllib2 and Beautiful Soup

I'm trying to write a small web scraper in python, and I think I've run into an encoding issue. I'm trying to scrape http://www.resident-music.com/tickets (specifically the table on the page) - a row might look something like this -

<td style="width:64.9%;height:11px;">
<p><strong>the great escape 2017&nbsp; local early bird tickets, selling fast</strong></p>
<td style="width:13.1%;height:11px;">
<p><strong>18<sup>th</sup>&ndash; 20<sup>th</sup> may</strong></p>
<td style="width:15.42%;height:11px;">
<td style="width:6.58%;height:11px;">

I'm essentially trying to replace the
with £55, and any other 'non-text' nasties.

I've tried a few different encoding things you can go with beautifulsoup, and urllib2 - to no avail, I think I'm just doing it all wrong.


Answer Source

I used requests for this but hopefully you can do that using urllib2 also. So here is the code:

#!/usr/bin/env python
# -*- coding: utf-8 -*-

import requests 
from BeautifulSoup import BeautifulSoup

soup = BeautifulSoup(requests.get('your_url').text)
chart = soup.findAll(name='tr') 
print str(chart).replace('&pound;',unichr(163)) #replace '&pound;' with '£'

Now you should take the expected output!

Sample output:


Anyway about the parsing you can do it with many ways, what was interesting here is: print str(chart).replace('&pound;',unichr(163)) which was quite challenging :)

