Robert Birch Robert Birch - 11 months ago 78
Python Question

Combined Exceptions WIth Beautiful Soup HTTPError Not Defined

I'm trying to write code so that it scrap stock symbols data into a csv file. However, I get the following error.

Traceback (most recent call last):
File "", line 23, in <module>
page = urllib2.urlopen(""+newsymbolslist[i] +"%20Key%20Statistics").read()
File "C:\Python27\lib\", line 127, in urlopen
return, data, timeout)
File "C:\Python27\lib\", line 410, in open
response = meth(req, response)
File "C:\Python27\lib\", line 523, in http_response
'http', request, response, code, msg, hdrs)
File "C:\Python27\lib\", line 448, in error
return self._call_chain(*args)
File "C:\Python27\lib\", line 382, in _call_chain
result = func(*args)
File "C:\Python27\lib\", line 531, in http_error_default
raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
urllib2.HTTPError: HTTP Error 400: Bad Request

I have tried this suggestion but it has not worked which imports urlib2 HTTPError into the program. (It seems redundant to do that since I already have the module imported.

The symbols.txt file has stock symbols. Here is the code that I am using:

import urllib2
from BeautifulSoup import BeautifulSoup
import csv
import re
import urllib
from urllib2 import HTTPError
# import modules

symbolfile = open("symbols.txt")
symbolslist =
newsymbolslist = symbolslist.split("\n")

i = 0

f = csv.writer(open("pe_ratio.csv","wb"))
# short cut to write

f.writerow(["Name","PE","Revenue % Quarterly","ROA% YOY","Operating Cashflow","Debt to Equity"])
#first write row statement

# define name_company as the following
while i<len(newsymbolslist):
page = urllib2.urlopen(""+newsymbolslist[i] +"%20Key%20Statistics").read()
soup = BeautifulSoup(page)
name_company = soup.findAll("div", {"class" : "title"})
for name in name_company: #add multiple iterations?
all_data = soup.findAll('td', "yfnc_tabledata1")
stock_name = name.find('h2').string #find company's name in name_company with h2 tag
f.writerow([stock_name, all_data[2].getText(),all_data[17].getText(),all_data[13].getText(), all_data[29].getText(),all_data[26].getText()]) #write down PE data
except (IndexError, urllib2.HTTPError) as e:

Do I need to define the error more specifically? Thanks for your help.

Answer Source

You are catching the exception in the wrong location. The urlopen() call throws the exception, as shown by the first lines of your traceback:

Traceback (most recent call last):
  File "", line 23, in <module>
    page = urllib2.urlopen(""+newsymbolslist[i] +"%20Key%20Statistics").read()

Catch it there:

while i<len(newsymbolslist):
        page = urllib2.urlopen(""+newsymbolslist[i] +"%20Key%20Statistics").read()
    except urllib2.HTTPError: