Simon Breton Simon Breton - 1 month ago 20
Python Question

skiping Error 404 with BeautifulSoup

I'm trying to scrap some URL with BeautifulSoup. The URL I'm scraping are coming from a google analytics API call, some of then aren't working properly so I need to find a way to skip them.

I tried to add this :

except urllib2.HTTPError:
continue


But I got the following syntax error :

except urllib2.HTTPError:
^
SyntaxError: invalid syntax


Here is my full code :

rawdata = []
urllist = []
sharelist = []
mystring = 'http://www.konbini.com'
def print_results(results):
# Print data nicely for the user.

if results:
for row in results.get('rows'):
rawdata.append(row[0])
else:
print 'No results found'

urllist = [mystring + x for x in rawdata]

for row in urllist:
# query the website and return the html to the variable 'page'
page = urllib2.urlopen(row)
except urllib2.HTTPError:
continue
soup = BeautifulSoup(page, 'html.parser')

# Take out the <div> of name and get its value
name_box = soup.find(attrs={'class': 'nb-shares'})
if name_box is None:
continue
share = name_box.text.strip() # strip() is used to remove starting and trailing

# save the data in tuple
sharelist.append((row,share))

print(sharelist)

Answer

Two errors:
1. No try statement
2. No indentation

Use this:

for row in urllist:  
          # query the website and return the html to the variable 'page'
    try:
        page = urllib2.urlopen(row)
    except urllib2.HTTPError:
        continue